• . ^ DOCtlHEHT EESOHE 

ED 123 678 ' CS 501 389 

TITLE Status Report on Speeci Besearch: A Report on the 

, > Status and Progress of Studies on the Nature of 
Speech, Instrumentation for Its Investigation, and 
Practical Applications, January 1 - June 30, 1976 • • 
IHSTITOTION Haskins Labs. , New Haven, Conn. 
REPORT NO SE-45/46-(1976) ' 

P.OB DATE 76 • 

NOTE 23'8p. ■ • ' 

JDRS PRICE ' MP--$0.83 HC--$1?.71 Plus Postage. 

DESCRIPTORS Articulation (Speech) ; Beginning Reading; Conference 

Reports; *Educational Research; Higher Education; 
Language Development; *Language^ Skills ; *Oral 
* Communication; *Speech; Speech Skills 

IDENTIFIERS *Status Reports 

ABSTRACT » ' . • 

This report, covering the period of, January 1 to June 
30, 1976, is^one of a regular series on the status and progress of 
studies on the nature of speech, instrumentation for its 
investigation, and practical applications. The manuscripts and 
extended reports contained in this report include "Exploring , the , 
Relations between Reading and Speech," "On Interpreting the Error 
Pattern in Beginning Reading," "^Comments on the Session: Perception 
and Productiqn of Speech II; Conference on Origins and Evolution 
Language and Speech," "Consonant Environment Specifies Vowel 
Identity," "What Information Enables a Listener to Map a^Talker's 
Vowel Space?" "Identification of Dichotic Fusions," "Discrimination 
of Dichotiq Elisions," "Coperception: Two Futther Preliminary 
Studies," "•P'osner's Paradigm • ahd Categorical Perception: A Negative 
Study," "Weak Syllables in a t»j:iniitive Reading-Machine Algorithm." 
"Control of Ftind^mefttal Frequency, Intensity, and Register of 
Phonation," "The Effect of Delayed Auditory Feedback on Phonation: An 
Electromyographic, Study, " "Some Aspects, of Coarticulation," "The 
Function of ^Strap Muscles in Speech," and "Laryngeal Muscle Activity 
in Stuttering. " .(^^B) ' ' . * 



( 



\ 



* Documents acfguired by ERIC include many informal unpublished * 
*, materials not available from other sources. ERIC makes every effort * 

to obtain the best copy available. Nevertheless, items of marginal 

♦^reproducibility are often encountered and this affects the quality * 

* of the .microfiche and hardcopy reproductions ERIC makes available * 

* via the ERIC Document, Reproductipn Service (EDRS) . EDBS is not - * 

* resp^onsible for the quality of the original documejit. Reproductions * 

* supplied by EDRS are the best that can* be made from the origina^l. * 



US OEMRTMtNTOr HEALTH. 
EDUCATION A WELFARE 
NATIONAL INSTITUTE OF 
EDUCATION 

THIS DOCUMENT MAS ftEEN REPRO* 
OUCEC^ EXACfLY AS RECEIVED FROM 
THE PERSOf^ OR ORGANlZAtlON ORJGtN- 
ATIN0<T POINTS OFWIEW OR Of*»NIONS 
STATEO 00 NOT NECESSARILY REPRE- 
SENT OFFICIAL NATIONAL INSTITUTE OF 
EOUCATION POSITION OR POLICY 



SR-A5/A6 (1976) 



Status Report on 

SPEECH RESEARCH 

A Report on 
the Status and Progress of Studifes on 
the Nature of Speech, Instrumennation 
for its Investigation, and Practical 

Applications 



1 January - 30 June 19/6 



s 



Raskins Laborduories 
270' Crown Street/ . 
New Haven, Conn/ 06510 



Distribution of this document is unlimited, 



7 



(This document contains no information ttpt freely available to the 
general public, HaskinS Laboratories cliidtributes it primarily for 
library use. Copies are available froin 'the National Technical . 
Information Service or tihe ERIC Document Reproduction Service. 
See. the Appendix for order numbers of ptevious Status Reports.) 



• / 



ACKNOWLEDGMENTS 



The research reported "here was made possible in part by support 
from the following sources: • - - — - 



National Institute of Dental Research 
Grant DE-01774 

National Institute of Child Health and Human Developp^t 
Grant HD-0 19:9.4 . ^ 

Assistant Chief Medical Director for Research a^ Development, 
Research Center for Prosthetics, Veteran^^Admini^tr^ition 
Contract V101(134)P-342 

Advanced Research Projects Agency, Infonaation Processing 
technology Office^ under contractyifith the Of f ice 'of * 
Naval Research, Information Syst^s Branch 

Contract N000l4-76-*C70591 / 

United States Army Electronics Co^hmand, Department of Defense 
Contract DAAB03-73-C-0419(L 433) 



National Itistitute of 
Contrac 



Child ^aith and Human Development 
" HD-Xr2420 



National Institutes of Health 



Generaly^sejarc^"^ Support Grant RR-559^ 



\ 



HASKINS LABORATORIES 
Personnel In Speech Research 

Alviri LiberAan,* President and Research Director 
Franklin S. Cooper, Associat'e Research Director ' - 
Patrick W. Nye, Associate Research Director 
Raymond' C. Huey, Treasurer ^ 
Alice D^dourian, Secretary 



Investigators 

Arthur S. \bramsott* 
Thomas Baer* 
Peter Bailey-^ 
Fredericka Bell-Berti* 
Gloria J. Borden* 
James £• Cutting* 
Ruth S. Dajr* ' ' 
Michael F. Dorman* • 
Frartcres J. Freeman* 
Jane Gaitenby 
Thomas J. Gay* 
Terry .Halves 
*Katherfne S. Harris* 
Alice Healy* . 
Isabelle Y. Liberman* 
Leigh Li5ker*v 
Ignatius G. Mattingly* 
Paul Mermel stein 
SeijiNiimi^' 
Lawrence J. Raphael*" 
Bruno H. Repp* ' 
Philip E. Rubin* - 
Dqnald P. Shankweiler* 
George -N; Sholes. 
Michael Studdert-Kennedy* 
Quentin " Summerf ield^ 
•Michael T. Turveyi^ 



< Technical and Support Staff 

.Eric Andre^sson 

D^orie Baker* 

Elizabeth P. Clark 

Cecilia C. Dewey 

Donald* S. Hailey i 
• Harriet G^ Kass* 

^aSiria D. Koroluk 

Christina R. LaColIa 
* Roderick M. McGuire 

Agnes McKeon 

Terry F, Mont lick 

Loretta J, Reiss 
V_^lliam P. Scully * • ' ' 

Richard S. Sharkany ' ' 

Edward R'. Wilfi^y 

David Zeichner 



Students* 



Mark J. piechner 
Steve Braddon 
^David Dechovitz 
Susan *Leg Donald 
Donna Erickson 
F. William f iscKer 
Hollis Fitch ' 
Carol A. Fowler 
Morey J* Kitzman 
Gary Kuhn 
Andrea G. Levitt 



Roland Mandler 
Leonard Mark 
Robert F, Port 
Sandra Prindle , 
Abigail Reilly 
Robert Remez 
Helen Simon 
Emily Tobey. 
Harold Tzeutschler 
James M. Vigorltb 



*Part-time , , ^ 

IVisitlng from The Queen's University of Belfast, Northern Ireland, 

^Visiting from University .of Tokyo, Japan, . ^ 



/ 

/ . 




II. 
III. 



■ • 1 . ■ . > 

' CONTENTS . 

r Manuscripts and Expanded Reports 

Exploring the Relat^ns between Reading and Speech — Donald Shankweil6r 
and Isabelle Y*. Liberman . - i* 1 

On Interpreting the Error Pattern in- Beginning Reading — 

Carol A. Fowler, Isabelle Y. Liberman, and Donald Shankweiler 17. 

Comments on the Session: Perception and Production of Speech II; 

Conference on Origins and Evolution of Language and Speech — 

A. M. Liberman 29 

Consonant Environment Specifies V^x^el^'ldentity — Winifred Strange, 

Robert R. Verbrugge, Donald P. Shankweiler, and Thomas R. Edman. . . ♦ . 37 

What Infbrmation EnalTles a Listener to Map a Talker's Vowel Space?- — 

Robert R. Verbrugge, Winifred Strange, Donald P. Shankweiler, and 

Thomas R. Edman. i 63 

Identification of Dichotic Fusions -xr Bruno H. Repp*.* 95 

, Discrimination of Dichot it' Fusions Bruno H. Repp. ^ . . 123 

Coperception: Two Further Preliminary Studies — Bruno H. Repp 141 

* /'posner's Pa^^adigtn" and Categorical Perception: A Negative Study — 
^uno fl. Repp \ ' ' r! 153 

'Weak Sy^^ables in a primitive Re^dding-Machine Algorithm — 
George/ Sholes . ^ 163*' 

Control of Fundamental Frequency, Intensity, and Register of 

Phtonation — Thomas Baer,- Thomas Gay, and Seiji Niimi. 175 

The Effect, of Delayed Auditpry Feedback, on Phonatipn: An Electromyo- ' 
gra'phic Study. — M. F. Dorian, F. Jl Freeman, and G. Ji Bprden 187 

Some Aspects of Coarttcul|ttion — Fredericka Bell-Berti and 

Katherine S. Harris. * . . . .■ . 1 . . .' 197 ' 

» ■ • ' , 

The Function of Strap Muscles in Speech -r Donna Erickson and 

James E. \Atkinscm^ . . .t. . \ 205 

Laryngeal Muscle Actiyitiy^ in Stuttering — Frances J. Freeman and 

Tatsujiro Ushijima . . ^; .\ . . . 211 

Publications and Reports . 239 

Appendix ; DDC and ER*lq|numbers (SR-21/22 - SR-4A) .... 241 



) 



ERIC 



I. MMUSCRll?TS AND EXTENDED REPORTS - " k 



Exploring the Relations between Reading and Speeqh* ^ • ' 
Donald Shankweller and Isabelle Y. Llberman 



ABSTRACT 

Acknowledgment of the priority of the spoken language and th,e de- 
rivative nature of the writing system* is an essential starting point 
for an investigation of reading acquisition in children. The rela- 
tions between the language and the writing system are manifold and 
comp-lex so that spoken sounds and alphabetic /characters cannot be re- 
lated in a one-to-one fashion. There is reason to believe that the 
phonetic level of representation plays an especially signif^.c^nt role 
in the acquisition of reading 'in the young child. Wja^ considered that 
a primary function of a phonetic representation is 'to yield an ade- 
quate span in working memory to permit linguistic interpretation of 
the temporally arrayed segments of the message. Results of our stud- 
ies of short-teriif memory in good and poc^r readers suggested that the 
ppoV reader is deficient in formings phonetic representation from 
speech as well as from script. In, order to learn to read an alpha- 
betically written language, the chfld must have, in addition to a 
phonetically organized short- teirm memory, the ability to make explic- 
it the phonemic segmentation of his oxm speech.. The findings indi- 
cate tha^t.in contrast to the tacit appreciation of phonemic differ- 
ences in ordinary languaige use, explicit knowledge of the phonemic 
level is difficult to ai^taixi. Many ^children lack phonemic awareness 
when they start to learli to read and this may be a cause of reading 
failure. . . ' ^ 



To be published in> Neuropsychology of Learning Disorders; Theoretical 
Approaches , ed. by R. M.> Knights an4 D. K. Bakker. (Baltimore: UniversJLty 
Park Press) . • 

^Also Uifeversity of Connecticut, Storrs. 



ERLC 



Acknowledfement ; This work reflects the joint efforts of several individuals. 
The data were obtained by F. W. Fischer, C. Fowler,. L* Mark, and M. Zifcak, 
who also assumed responsibility for their tabulation and statistical'analysis. 
We are alsci^ indebted to A. M. Liberman, who suggested the hypothesis concern- 
ing the futlctions of the phonetic representation. Full details of the experi-r 
ments on phonetic coding in recall will be presented in a paper in preparation. 

[RASKINS LABORATORIES: Status Report on Speech Research SR-45/46 (1976)]. 



Given so litti'e agreement on lipw test to teaqh children to read, it; is per- 
h^ps^'not surprising to find divergent cbhceptions of the nature of reading it- 
stelf. Among these, we find t;wo. contrasting positions concerning the relation- 
Ships' between reading and speech* On the one h^nd, soiae writers (e.g., Goodman, 
^'19^8; Sjnith, 1973) have- tended to*^ ignore the relationship, choosing 'instead to 
emphasize the relative autonomy. of reading and writings, Theff^counsel is, in 
effect, to forget about speech when teaching reading, t A major target ojE* their' 
criticism has been the so-called phonic approach to reading instruction, which 
stresses the letter-to-sound mappings 'while failing to appreciate that we cannot 
read simply by concatenating individual letter sounds. On the other hand, we 
and a few other investigators (Huey, 1908; Mattingly, 1972; Shankweiler and . 
Liberma'n, 1972; Rozin and Gleitman, in press) have emphasized the importance of 
the deriyative nature of reaj^ng and writing and the intimate connection .between 
speech and the alphabet. .,In defending this aspect of the ptudy of reading, ' 
however, we give, due weight to/ the complexity of.^the relationship. We believe* 
that mai^y of the criticisms that have been raised would apply only to a very ^ 
simplistic view of how spoken sounds and alphabetic dTiaracmrs are related. 

Central to the^nderstanding of how reading is acquired, in our view, is 
the question of how reading builds on the'speech processes of the' child. We 
know, of course, that spokei\ language is historically prior to reading and writ- 

.ing in the 4evelopment of the race, ontogenetically prior in the life of 'the 
individual, and logically prior to the relation of written sjrmbols to their 
speech referents. Further evidence of the derivative .status af writing and 
reading and the practical importance of the priority of speech is readily at 
hand. Consider the contrasting situations of the congenitally blind and the 

^ congenitally deaf. The blind acquire spoken language, normally; the profoundly 

'deaf, even under the most favoralile^onditions , a s,o effectively isolated frqm 
language that they show severe def-duiiencies in every aspect of language develop- 
ment (FiirtK, 19^6). Since the blind cfiild learns to read by means of the sub- 
stitute sense of touch, we may ask why the dea^ child cannot 'effectively exp^loij^ 
his intact visijkl channel for reading. Prefeymably he cannot do so because deaf-"' 

* ness bl6cks the development 'of a foundation in primary language so necessary as 
a basis for learning to read. If reading were,* as some have argued, an altema- 
tiive and coeqiial language reception system, then it would be hkrd to explain why 
the deaf could not learn language by eye as readily as the hearing ^leam it by 
ear. Our interest, of course, is to understand, the acquisition of reading in 
children witW intact sensorjr capacities. We make reference to reading in the 
1>lind and the deaf only. to emphasize how closely reading is tied to- speech. 

If readiln?g and speech are so closely linked, we would* exf)ect them to share 
much of the /same neural machinery. As Haliwes (1968) has pointed out, it is un- 
p.arsimonioui to imagine a; completely parallel language under-Standing system 
(for reading) /that borroyed nothing from the primary speech system. Rather than 
developing k separate device for reading,, it would be more parsimonious to ex- 
pect that jhe' would-^Tje reader modifies t^he speech perception system to accept , . 
optical information. We assume that thp speech system wo^rks by mapping t{\e 
acoustic signal into progressively more abstract representations, and we assume » 
that the reading' device must tie in wiph that system at ^ome, level". How much 
^visual processing must be \done before ^script can be represented in the common 
language processing systemWas though ^ the input had been gpeech rather than 
script)? / To put the <juestion anot-her way, what is the level of representation 
at which /script is .recoded?\ . . , » 



Certain facts ^ibout the v^jriting system must constrain how we conceive of the 
reading process. Ali writing systems make contact at'^some point -with tKe sjJbken 
slanguage. Some, like Chinese and Japanese logographs, tie in at the leve]^ of 
-words, others at the level of the syllable. Some — the alphabets — link the^r 
primary symbols to distinctive, aspec.ts' of the scfund* structure of the langi^^ge. 
In the case of English, there is good reason to believe that. script' makes cpn- 
tact with the primary^language system at' more than one level. At times, simi- 
'larity 6f".spelling may denote not similarity of sound, but similarities of ori- 
^nd root meaning, 'as in such word pairs as sign and signal* Such cases/are 
not uncommon. Moreover , ''the assignment of grammatical cj.ass is sometimes pre- 
liminary to determining the correct phonetic form. To use an example of Rozin 
and Gleitman (in press), the^written word contract is ambiguous until we know 
whether it functions'. as a noun or as a verb. The correct phonetic representa- 
tion of such ambiguous words cannot be fully attained without reference to more 
molar representations These observations obviously constrain otir choices when 
we attempt 4o model the perceptual system in reading. Thus, we do not assume 
that tl>e reader is tied to a rigid hierarchy of successive processing stag^. 
Rathfer, we suppose that the transformation of script into speech occurs at a 
number of levels concurrently and in. parallel. 

To recapitulate, the fundamental task of the beginning' reader is to con- , 
struct a link between speech and the arbitrary signl^of script. Although the 
alphabet is roughly a cipher on the phonemes of sjfelch, this does not imply that 
learning to read ife merely a matter of acquir^Lng l^ter-to-souhd correspondence. 
English spelling does net fully reflect the phone t/c facts of the language, and 
at times seems deliberately to ignore them in ord/r to convey other kinds of 
informatioji'-helpful to the easy comprehension of *7hat is read. We assume^ that 
the experienced reader learns to detect and to exploit suc^i multileveled repre- 
.sentation, , though the complexity of the orthography is surely an added source 
of difficulty for the beginning reader. ' , * - 



FUNCTIONS OF THE PHONETIC REPRESENTATION ^ 




Although English spelling is not a faithful phonetic transcription, there 
is reason to suppose that the phonetic level of representation plays an espe- - 
cially signifiqant role in the acquisition of reading in the young child. Even 
in English the alphabet is largely keyed to the sodnd s^tructur^. Hence, new 
words can be given at least an approximate pronunciation on first encounter if 
the reader understands how the alphabet works. Obviously, the .reader must re- 
code phonetically if he- is tjo ob'tain the phonetic realization of a new word. 
But what does he, do with words and phrases he has read many times? Does he in 
thes^ cases construct a phonetic representation,^ or, does .he, as some believe, 
bypass the phonetic level and go. directly from visual *shape to meaning? 

€ 

It seemis likely that phonetic recoding might occur even with frequently 
read materials, and that its persistence in bltier, more experienced readers is 
not to be regarded merely- as a habit that has ceased to be functional. The 
possibility we are proposing is that the reader needs a phonetic base on which 
to extract the message from its encipherment in script; that, is, the normal 
primary language processes of s^toring, indexing, and retrieving from the diction- 
ary in our heads are carried out by means of a phonetic code. Moreover, in 
addition to the possibility that the dictionary may be indexed phonetically, 
^ consider what cues we use to^ decode the syntax' of the message. Here we are 
aided by the rise attJi fall of the speech melody and its pattern of rhythms and 



' ^ ' y — - ■ ' . * 

Stresses.. These arp not given directly JLn script, and it may require the media- 
tion of an internal phonetic representation to enable the reader to construct 
those prosodic features efo necessary to comprehetisafon (Liberman, Shankweiler, 
Liberman, Fowler, and^ Fischer, in press). 

Since the perceiver cannot process each message unit fully at th*e time of 
its arrival, we may be sure that short-term memory is one* of the primary lin- 
guistic processes essential to comprehension of both written and spoken language. 
The perceiver, whether functioning as reader or hearer, must hold a sufficient 
number of shorter segments (words) in memory in order to apprehend the longer 
segments (sentences). Obviously, if he had a span of only two words, the per- 
ceiver 's comprehension qf connected ^discourse would be extremely limited. But 
does the reader form a different kind of memory representation than the heargr? , 
Although we do not rule out the ppssibility that, read words can be held tempor- 
arily in some visual forl^a, we have indicated reasbns ^above for supposing that 
the read^ typically engages in recoding from script to some, phonetic form. 
[See Liberman, Mattingly , and Turvey ^1972) for a fuller exploration of the sug-*^ 
gestion that the phonetic representation Is uniquely suited to the short-term 
storage requirements of language.] 

Apart from these speculations, there is much relevant experimental evidence 
for phonetic recoding. In many investigations it hais been found that when lists 
of letters or alphabetically written words are presented orthographically to be 
read and r^emembered, the confusions in short-term ^memory are based on phpnetic 
raJther than visual similarity (Sp&rling, 1963; Conrad, 1964, 197iA)Baddeiey, 
1966, 1968, 1970; Hintzman, 1967; Kintsch and Buschke, 1969). Frto these find- 
ings, it has been inferred that the stimulus items had been stored ih phonetic 
form rather than in visual form. .Conrad (1972). has emphasized that the tendency 
to recodejvlaualLy presented jitems^ into, phonetic form is so strong that subjects 
do this even in .Experimental situations in v^ich to do so 'penalizes recall. 

There is evidence from a similar kind of experiment '(Erickson, Mattingly, 
and Turvey r 1973) that phonemic recoding occurs even when the linguistic stimuli 
are no t pre sented in an alphabetic form that represents the phonetic 'structure, 
but in a fbrm (the^ Japanese kdnji characters) that represents the semantic mes- 
sage more directly. Moreover, under, some circumstances, even nonlinguistic 
stimuli may be recoded into phonetic form Jind stored in that form in shorf-term , 
memory. In this connection, Conrad (1972) " foun^l that in reCall of pictures of 
common objects, the confusions of children aged iix and over were clearly based 
on the phonetic forms of the names of the objects rather than on their visual or 
semantic characteristics. . i" 

To be sure^ none of these experiments dealt with yjiolly natural reading 
situations, since^most involved the reading of isolated' words and syllables 
rathar than* connected text. , They are nevertheless rel^evant to the assumption 
that even the skilled reader might recode pho'neticaXly in order to gain an ad- 
vantage in short-term memory and- to utilize the primary .language proc,esses he al*- 
ready has available to him. It reniains to* be determined whether good and poor 
readers among children ^in the elirly states o& realding acquisition are' distin- 
guished by g'reate^"0ir1Iesser tendencies toward ph6netic recoding. 



ERIC 



ERIC 



PHONfeTIC RECODING IN tiOOD AND POOR BEGINNING READERS 
I • ^ • I 

In view of the shprt-term memory requirements of the reading task and evi- 
dence for the^ involvement of phonetic coding in short-term memory, we might ex- 
pect to find* that those beginning .readers who are progressing well and those who ^ 
are doing jfeorly will b-e further distinguished by the degree to Which they rely 
on phonetic receding. 

• In exploring this possibility, we studied three groups of school children 
nearing completion of the second year of elementary school who differed in level 
of reading, achievement as measuif^d by the word recognition subtesjt of the \)ide 
Range Athievementf^est *(Jastak, %ijou, and Jastak, 1965), The first group, the 
superior readers, comprised 17 childrenTeading about two years above their 
grade placement. , The other two groups (whom we originally designated marginal 
and poor readers^ can be considered together as the "poor* readers" since their 
perforroances in these experiments wera not significantly different from each 
other. Together thB poor readers included, 29 children aver agings - from 'o^e-half 
to a full year of reading retardation and roughly equated with the sujpeiMor 
readers in irfean age an^l^Q. * • . 

-The experimental procedure was similar to one devised by Conrad' (1972) in 
which the subject's perfonkance is compared on recall of phonetically confusable 
(rhyming) and nonconfusable (nonrhyming) letter^. Our expectation was that 
phonetically similar items would maximize phonetic confusability and thus penal- 
ize recall in subjects who use the phonetic code in^ short-^tenn memory. Strings 
of five uppercase letters were presented* tachistoscopically in a simultan^eous 
3-sec exposure. Half were composed of rhyming consonants (drawn from the set 
*B C D. G P T V Z) and half were composed of nonrhyming consonants (drawn f/r.om the 
set H K L Q R S W Y)r * 

The test wa^ given twice: first with inmiediate recdll,' then with delayed 
recall. In the first condition, recall was tested immediately after presenta- 
tion by having subjects p*rint as many letters as could be recalled in each let- 
ter string, in the order given. To make the task maximally s^feitive .to the 're- 
call strategy, we then imposed a 15-sec delay between tachistoscopic presenta- 
tion and* the response of writing down the string of letters. The children wei;e 
requested to sit quietly during the delay interval; no intervening task was im- . 
posed. We have reason to believe that the subjects \ised this period for rehears- 
al^ since many were observed mouthing the syllables silently. ' ^' 



■ 

The responses were scored in tWo ways, with and withput regard to serial 
position. ^ In the* first scoring procedure, only those items listed in the^ cot- 
*rect serial position were counted correct. The" second scoring procedure c^redit- 
ed^ kny items that occurred in the stimulus set regardless of the order in \m±ch 
they were written down. The pattern of results was remarkably similar, given 
data derived from each method' of sjcoring. Abilitiy to, r^ecall in correct serial' 
ordef is apparently ^not the major factor that distinguishes good and poor read- 
ers on this task.^ . ' 

As was to be expected, the phonetic Qharactefistics of the items influenced 
the rat^e ofi correct recall. This may be seen in Figure 1, which shows the re- 
sults hummed over 'serial positions. The circles give the error rates for 
strings of' rhyming items (labeled "cpnfusable") ; the' triangles give errors on 
recall of the non;±^$aning (^Jnonconfusajjle") strings. In all* groups, there were 



4o,L SUPERIOR READERS 



30 



20 



. 10 



i 40rMARGIN*AL READERS 



^ .30h 

o 

UJ 20 

o 

»- 10 

(U 

E 

=3 



c 

S 40r 



30 - 
20- 
10 



POOR READERS 



Confu sable 
Nonconfusabte 



Immediate 
" Recall 



Delayed 
. Recalls 



Figure 1: Mean recall errors summed over serial positions.. 



significantly more errors on recall of the conf usable items. However, there 
were no;tahle differences in the effect3 of pHonetic similarity on the recall of 
children who differed in reading level. It is apparent from the figure that the 
main differences are between superior readers and the oth^r groups. 

< ♦ ® 

The net effect of phonetic confusability .on reca]jl was mucli greater in the 

supeiriar^readers than in the others. It would be difficult to explain this re- 
sult by assuming that the groups differ merely in general memory capacity. Supe 
rfbr readers werfe clearly better at recall on nonconf usable items than were the 
poor readers, while,, at the same time, failing to show a clear advantage on the 
conf usable items. We regard this as an interesting result. It is a relatively 
easy matter to demonstrate that poor readers do less well than good readers on a 
variety of language-dependent tasks. But here, by manipulation of the phonetic 
characteristics of the items, we have virtually eliminated the advantage of the 
superior readers.' 



As we said, recall was measured on half the trials immediately after pre- 
sentation of the display, and on the other half after a 15-sec delay. Turning 

/back to Figure 1, we see that delay magnified the penal effect of phonetic con- 
fusability, but only in the superior readers. Figure 2 shows plots of the error 
rates at each serial position. Viewing the results of the delay condition 
(shown in the lower p(fttion), we see that the superior readers are sharply dis- 
tinguished from the others in recall^ of nonconf usable items and nearly indis-, 
tinguisbable in thedr recall of confusable' items. Why should -imposing a delay 
between stimulation and recall affect good and poor readers differently? Is it 

♦ simply the case that good readers try harder and rehearse the items more vigor- 
ously? Although we cannot be stire, we do not think that vigor or rate of re- 
hearsal is a factor that chiefly distinguishes good and poor readers on this: 
task* Certainly we know that the poor readers were attempting to rehearse be- 
cause they so of tec^ mouthed the items during the interval (Liberman et al. , in 
press) . ' \^ . • 



We considered and reject^ "other explanations of the pattern of results ob- 
tained by good and poor' jre^ders. (1) The difference between the groups cannot 
easily attributed t^xSriefer memory span in the poor readers. Even if it 
were generally true^^Ctiat poor readers have briefer spans, the. differential 
effect of phonetic similarity on recall performance by the two groups would 
still require^^lanation. (2) To suppose that the poor readers suffer mainly 
from a difJE^fTculty inVeproducing the order of the items in the memory set en- 
counters/^e same difOsCulty. Moreover, as we said, the pattern of. results is 
muclytfiie same when, the d'coiring credited the correct items in each atrdng re- 
g^ii^ess of serial position. " ' ^ 

The interpretation we find most plausible ■ and interesting is that the re- 
. suits reflect genuine differences between good an4 poor readers in their use of 
a^^onetic code. Of course we cannot argue that phonetic coding is entirely 
abs^t in the poor readers, since they demonstrated significant effects of con- 
fusability, though of lesser magnitude. A weak or defective phonetic represen- 
tation in the poor readers could account ^f or the failure of rehearsal to be 
effective. 



r 



IMMEDIATE RECALL . 
Confusable - Nonconfusable 




O 
w 

Urn 

Ui 



C 

o 

0) 




1 2 3 • 

• Sup«rioP Readers 
iK— Marginal Readers 
Ap—VA Poor Readers . ' 



DELAY.ED RECALL 




SeriqLPos*it>on 



Figure 2,: Recall data replotted as a function* of seris^l, position 



ERIC 



13 



( . 



* AN AUDITORY ANALOG AND ITS VISUAL COUNTERPART 

' In light of the foregoing results » it seemed reasonablee to suppose that 
poor readers may have a specific difficulty in constructing a phonetic represen- 
tation from script. Before we could accept this hypothesis, however, we needed 
to find out what would happen when confusable and nbnconf usable items w^e pre- 
sented by ear. Since phonetic coding is presumably^ inescapable when speech 
material arrives auditorily, presentation by ear should force the poor reader 
into a phonetic inode of information processings If an important component of 
his difficulty is a deficiency in recoding visual symbolic material into .phonet- 
ic form, then the phonetic similarity of auditorily presented rhyming items 
should affect him as much (or as little)' as it does the superior readers. 
Quantitative dif f.erences in memory capacity between the two groups may still 
shov up in the general level of recall on the auditory presentation, but the 
statistical interaction of reading level and phonetic confusability should be 
diminished. If > on the other hand, the interaction remained, then it would 
follow that the difference between good and poor readers in regard to the use of 
a phonetic representation is not specifically linked to the visual information 
channel. 

Two new experiments were carried out on the same subjects in order to 
clarify this importaht point. Since aij^ditory presentation requires successive 
input, a parallel experiment was designed* with visual serial presentation. Ex- 
cept in minor details, the results are like those previously obtained for simul- 
taneous presentation of the letters, and, to our surprise, the visual and audi- 
tory experiments differed hardly at all in their results. The findings of each 
experiment are displayed in Figure 3, which gives serial position curves for 
recall of auditorily presented and visually ^presented items. As in the earlier 
experiment, the , performances of the groups representing the extremes of reading 
ability differed mainly on the phonetically dissimilar items. Once again, 
phonetic similarity produced a greater impact on the superior 'readers than on 
the poor 'ones. It made practically no difference to the results wheth(fer the 
items to .be recalled were presented tq the eye or to the ear. Apparently, the 
crux of the difficulty for the poor reader on these tasks cannot be pinpointed 
as specifically .as \!5fe originally believed. Though poor readers may indeed ex- 
perience difficulty in the trans fotmat ion *of Visual features into phonetic ones, 
the root problem, is more' general. * 




These new experiments lead us to expect that differences betyeen^a«^d and 
poor, readers will turn on their ability to determine and use a phoifelric repre- 
sentation and not merely 'on their ability to recode from script. We suspect 
that individual differences in the ay^ilability of phonetic recoding strategies 
on ifecall tasks may indicate* limits of 't\ie reader's active awareness of thos^ 
aspects of language structure to which the alphabet is most directly keyed. 
This is a possibility that we sh*Ll ^ish to explo;re directly. We turn* now to 
thosia aspe<^ts of cognitive development that are most relevant to use of an 

alphabet.^ « '* , , * 

... t' • * " 

WHAT A CHILD l^EEDS TO "KNOW" IN ORDER TO USE AN ALPHABET TO FULL ADVANTAdE » 

• The preliterate child brings io^ the task of learning to read considerable 
competence in his spoken language. Our concern is to discover what additional 
abilities he Heeds in order to become a reader. Bolinger (1968) places the prob- 
lem of the learner and the teacher of reading in proper perspective: 

# . ♦ * • 

• 9 ■ 



ERIC 



When a child who is already almost fully equipped with a language 
comes to %the task of reading, anything that* will help him transfer 
what he already knows to what he is expected to write and read is 
priceless (p* 177). , ' ' 

We have argued that an efficient short-term memory system is a requirement 
for good comprehension of lai^age. both by eye and by ear, and that this re- 
quirement is most efficiently met joy a phonetic representation, Reading, how- 
ever, poses an additional requirement. The child must also have ready conscious 
access to certain .aspects of the /contents of that memory; he must have, in 
Mattingly's (1972) phrase, .aiMegtee of "linguistic awareness." In order to 
realize fully the advantages ofy an alphabet, the user—child or adult—must know 
quite explicitly .what speech segments a^e represented by the strings of letters 
(Liberman, 1973; Liberman et ^1., in press). 

It is appropriate at this point to remind oufselves of the benefits that 
alphabets confer. As we have said, a unique advantage is that each new word 
^does^not-fhave to b"e learned as if it were an ideogra)phic character before it can 
be read. That i^, given a word that is already in his mental word store, the 
reader can apprehend the word without specific instruction, though he has never 
seen it before in print; or, given a wo?d that he has never before seen -or 
heard, he can closely approximate its spoken form until its meaning can be in- 
ferred from context or discovered later by asking someone about it. By func- . 
tioning, however rpughly, as a surrogate for phonemes, the alphabet gives its 
users immediate access to all items in a vast word store by means qf a highly 
economical symbol set. 

The savings may be lytSTnowever, only by the user who knows how the alpha- 
bet works. As in all con^Lex cognitive skills, alternative strategies are pos- 
sible,. The very diversity o^the orthographies ^that have developed during, the- 
course of evolution of writing is testimony to the flexibility of the perceptual 
apparatus. It is possible to read words written by gn alphabet as thoirgh they 
were logograms. Many children undoubtedly .begin to rfead in this way. However, 
the unique advantages jof the alphabet arQ closeS to the child who cannot use it 
analytically; tliough he may translate the logograms into phonetic -representation 
this will not. help hitn to apprehend new words. In order to make the alphabet 
work for him, the child has first to be able to make an explicit analysis of the 
segments, of spoken language. He has to b^ able to analyze speech into words,' 
syllables!* and phonemes.. The last mentioned is of particular importance for 
users .of an alphabet, because the phoneme* is the principal point at which, the 
writing^syst4m meshes with the speech system. ^ ' - 

When wfe speak of explicit^ knowledge of the segments in the spoken message, 
we wish to^ke it very clear that something mor^ is involved than the ordinary 
competenp^^equired in language use. That is to say, a person may be a complete 
ly adeg<Jate. speaker-hearer of his language without having the dimmest awareness 
that the spoken word bed contains three phoneme segments and t4nd contains four. 
TKe^iMnediate recognition of these as different words, failing the abili,ty to 
indicate that /n/ is the unshared segment, is an example of what Pplai:iyi (1964) 
has called "tacit knowledge.'* Such knowledge is sufficient, or course, for com- 
prehension of the spoken message. Writing and reading, on the other hand, de- 
mand an. additional analytic capability. Even before the advent of writing, 
those who used speech poetically .in songs and chant must have been able to count 
syllables in order to form the meter, and been^ aware of the -phonemic level in 



order, m^ke rhjoues. Some such explicit knowledge of 'these properties of 
speech M->a precondition for understanding the alphabetic principle. 

h ' THE-^)IFFI'PULTIES OF MAKING SPEECH SEGMENTS EXPLICIT 

Elsewhere (Liberman,. 1971, 1973; Liberman, Shankweiler, Fischer, and 
CartOT, 1^^) we have considered why awareness of the phoneme might be rdther 
diffici^ic to attain. ^In brief, we referred to a fact about the acoustic Struc- 
tur^i^f speech. Conspriants and vowels are not discretely present in the signals 
but/are represented overlappingly in the syllable, a condition that has been 
called "ehcqdedness" (Liberman, Cooper, Shankweiler, and Studdert-Keimedy, 
196/7)1 As 'a consequence, the word dig , for example, has three phone£icse'gments 
buu only one acoustic segment. Analyzing an utt;erance into syllables, on the 
otfter hand, may pres^ent a different and easier problem. We expect this to be 
sol because in most , cases each syllable has a distinctive peak in acoustic en- 
ergy. The cue of auditory amplitude is a crude one that .could not be used to 
locate exact syllable boundaries, but it can serve to indicate to the listener 
hov many syllables there are in an utterance. 
\ _ • 

\ The merging of phones in the sound stream complicates the process of dis- , 
covA^y of the phonemic level of speech for the vould-^be reader. This is not to 
say,\ of course, that the young child has difficulty differentiating word^ pairs, 
suchVas bad and bat , that diffet in only one phoneme. There is evidence (Read, 
i971)\ that children hear these differences quite as accurately as ad.ults. The 
probleto is not, as many believe, to get the child to discriminate such word 
pairs ,^ut rather to lead him to appreciate that each of these words contains 
three $Wments, ^and that they are alike in the first 'two and differ in the 
third* ^hfs is a further example of the distinction we drew earlier between 
tacit anoy explicit knowledge of the phonetic structure of language. 

The encoded nature of the phonemes has another consequence that surely con- 
tributes to the difficulty of \learnlng to- read analytically. It makes it impos- 
S'ible to read by sounding out the letters one by one. In the example of dig , 
used above, reajjfng letter by letter gives, not "dig," but "duhlguh." In order 
to learn to read analytically, one must instead discover how many of the letter 
segments must be taken simultaneously into account in order to arrive at the 
"correct phonetic rejidition. In the case of the word dig , there iaC reason to be- 
/lieve the number woulS be .three. But, in fact, there is no simple rule for 
arriving at that number, and we suspect that learning to group the lelrters for 
the purpose of proper phonetic receding is one of the really significant skills 
one must acquire. Thus, even in languages such as Finnish and Spanish in which 
the writing system closely approrximates one-to-one correspondences between let- 
ters and phonemes, reading cannot be a simple matter of association between al- 
phabetic characters arid spol^en Sounds. In order to recover the spoken form, the 
reader must still "chuiik^' all the letters that represent the phonetic segments 
encoded inCo efaQli syllable. In the case of reading .a word in isolation, the ^ 
coding^ unit is probably the syllable. In reading connected text, the number of 
letters that must be apprehencied before recovery of the sopken form may at times 
be quite large, for reasons we ha^e discussed. We do not know how the coding 
unit may vary with the prosody of the text and the reader's experience, but we' 
may be sure that such units almost always exceed one letter in length. There- 
fore*, we would stress that making analytic use of an alphabet does not mean 
reading lett^r-by-lettet. * 

12 * . . , ' . 



1 / 



.The foregoing discuBsion has stressed that explicit awareness of the 
phonetic strucjiure of utterances is a very different thing from the ability to 
distinguiajiywords whose phonetic structure differs minimally. The latter Is 
easy for every normal child of school age, whereas the difficulty of explicit 
analysis h^s been noted by a number of researchers (Bloomfield, 1942; Rosner 
"^d Simon, 1971; Calfee, ! Chapman, and Venezky, 1972; Savin, 1972; Elkonin, 
. 1973; Gleitman and Rozin,; 1973). However;, there had been no experiments de- 
sign'ed to demonstrate directly that phonetic segmentation is more difficult for 
ydung children than sylla^^ic segmentation, and that the ability to do it might 
develop later. I 

^ DEVELOPMENT OF THE AWARENESS OF SPEECH SEGMENTS IN THE YOUNG CHILD 

Recently, we (LibermAi[i et al., Jl974) investigated phe development of the 
ability to analyze words explicit!/ in syllables ^nd phonemes. The task was 
posed to the child subject^ in tKa guise of a tapping game, in which segments ^ 
had to be indicated^by the number of taps. We found stieep age trends fot anal- 
ysis of weirds into^each kinfl of segment, but, at each^age, test words wei'e more 
readily segmented into syllables than into phonemes. . At age four, none of the 
children in our sample could segmei\t^by phonen^ (apcording to the criterion ve, 
adopted), while nearly 50 percent could segment by syllable. Even at age six, 
only 70 percent .Ss^cceeded in phoneme segmentation, whereas 90 percent were suc- 
cessful in the syllal^le task. " ^- » * 

Further research is needed to confirm and gieneralize these results. Since 
the syllable is also the unit of metric scan, it: is conceivable that the motor 
response of tapping is more coijipatible witii analysis by syllable than with dhal 
ysis by phoneme. " An alternative procedure, designed by Goldstein (LQ74) , asks 
the -child to indicate the number^ of segments in test words by counting out 
tokens, thus limiting rhythmic 'mcirtor responses that might bias^^.the outcome in 
favor of the syllable. 'Goldstein Is preliminary* work with this alternative pro- 
cedure confirmed that phoneme segmentation is genuinely. more difficult than syl 
lable segmentation. * ► \ . . 

We hope eventually to clarify the meaning of the age tnrends we found. On 
the one hand, the' inctease in ability to segment phonetically might result from 
the reading instruction that typically begins between afes five and six.' Al- . 
ternatively, it might be a manifestation of cognitive grot^h not specifically 
dependent on training. The latter possibility could be tested^ by a developmen- 
tal study of segmentation skills in a language community such as the Chinese, 
where the orthographic unit is the .word and where reading instruction therefore 
does, not demand the kind of phonetic analysis' needed in an alphabetic system. , 

. SEGMENTATION AND READING' ACQUISITION 

^ ^ There is some evidence that ,the difficulties of phoneme segmentation may b 
related to problei^s of early reading acquisition* Such a relation can b'e in- 
ferred fr;om the observation that children who are resJLs^tant to early reading in 
struction have problems even with spoken language when they are required to per 
form tasks demanding some ra^j&r explicit understanding of phonetic structure. 
Such children ar-e reported (Monroe, 1932.; Savin, 1972) to be deficient in rhym- 
ing, in recognizing that two^dif ferent monosyllables may share the same first 



^ 18 ■ 



\' ■ ■ ■ 



I 



ERIC 



(or last) phoneme segment, and also in playing certain speech games, which re- 
quire a shift of the initial consonant segment of a word to a nonsense syllable 
suffix. 

In our segmentation e2j:periment, we noted a sharp increase in the number of 
children passing the phoneme-segmentation ,task, from only 17 percent at age 
five to 70 percent at age six. Hence, the steepest rise in segmentation ability 
coincides with the first intensive concentration on reading-related skills in 
the schooling of the child. This result, together with the observations on the 
lack of "transparency" of the phonelfie to which we referred earlier, suggests a 
connection between phonetic segmentation ability and early reading acquisition. 
In a pilot study, we have begun to explore this relation. We measured the 
reading achievement of the children who had taken part in our experiment on*, 
phonemic segmentation described above. Testing at the beginning of the second 
School year, we found that half the children in the lowest third of the class 
in reading^ achievement — as measured by the word-recognition task of the Wide 
' Range Achievement Test (Jastak et al,, 1965)— had failed the phoneme segmenta- 
tion task the previous June; on the other hand, there were no failures in 
phoneme segmentation among the children who scored in the top third in reading 
ability. 

We are hopeful that studies of preschool .children's ability to segment 
speech Aay shed some light on the matter of reading readiness. We plan tc^^x- 
^ amine the-patt^m^ of reading -errors in children at diff«.^nt levels of reading 
ability in relaition to their al^ility** to, indicate the se^ierits of spoken speech, >w. 
If the indications of our pilot work are borne out, failure on t^oth the sylla- 
ble and the phoneme tasks at the first-grade' lievel will be prognostic of ex- . 
treme reading" di^iculty, ' . - , . • % • 

* « ' ' it 
i SUMMARY AND CONCLUSIONS '% 
— 

We believe the priority of spoken language and the derivative nature of 
reading and writing are the starting points for any understanding of the nature 
of writings systems and their acquisition, Reading, however, presents special 
problems fox the perceiv^r, the nature of which reflects the manner in which the 
.writing system makes contact with the primary speech system. In the case of 
English, the ties between the language and' its spelling are based only partly 
on the sound structure. Nevertheless, it is R^rticularly^appropriate to direct 
the child's attention to the phonemic level, because the plionemic correspon- 
dences* are the entry points, to •any alphabetic' witing system. / \ 

We considered-^th^ a primary function of a phonetic representation, whether 
for the listener or the^^^^eader, is to yield an adequate^ sp'an in working memory, 
to permit linguistic inter^et'ation of the temporally arrayed segments of the 
message. Results of our studies of short-term memory in good and poor readers 
suggested that the poor reader is deficient in forming a phonetic representa- 
tion from speech as well 4s from script, 

In order to learn to. read an alphabetically written language, the avail- 
ability of a phonetically organized short-term memory is not sufficient. In 
addition, the^ child must have the ability to make explicit the segmentation of * . 
his own spee.ch, particularly at the level of th6 phoneme. Data were presented 
indicating that explicit knowledge of the phonetic, level is difficult to attain 
in contrast to the tacit appreciat'ion of phonemic differences reflected in 

14 ^ • * 



10 



T^rdinary language use. We and others hWe noted thpt phonemic awareness 46 
lacking in many children when they start;^tD learn to read, and may be a cause 
of reading failure. In sum, the relations between speech and reading are both, 
intimate and subtle. It would seem appropriate for the early instruction in 
reading to place initial stress on making the^child aware of the speech segments 
he will eventually learn .to represent by vritten signs. 

REFERENCES 

Baddeley, A. D. (1966) Short-term memory for word sequences as a function of 
acoustic, semantic, and formal similarity. Quart. Exp. Psychol. 18, 
362-365. 

^addeley, A. D. (1968) How does acoustic similarity influence short-term 
\ memory? Quart. J. Exp. Psychol. 20. 249-264. 

Baddeley, A. D. (1970) Effects of acoustic and semantic similarity on short- 
terra paire^ ^associate learning. Brit. J. Psychol. 61 , 335-343. , 

Bloomfield, L. (1942) Linguistics and reading. In Elementary English 18 , 

123r-130; 18; 183-186. , ' ' . ^ <l • 

Bplinger, D. (1968) Aspects of Language . (New York: Haifcourt, 'Brpce ^ 
Woifld). 

Calfee, R. , R. Chapman, and R. Venezky. (1972) How a child needs to- think to 
learn to read. In Cognition in Learning and Memory , ed. by L. W. Gregg. 
(New York: ; Wiley). 

Conrad, R. (1964) Ac6ustic confusions in immediate memory. Brit. J. PsycHiol. 

55, 75-84. j • • . 

Conrads R. (1972) Speech and reading. In Language by Ear and by Eye: The 
Relationships between Speech and Reading , ed. by J. E. Kavanagh and I. Q. 
Mattingly*. '(Cambridge, Mass.: MIT Press). 
Elkonin, D. B. (1973) U.S.S.R. In Comparative Reading , ed. by J. Downing? 

(Ney York: Macmillan) . 
Erickson, D., I. G. MatCingly, "and M. T. Turvey. (1^73) Phone tferactivity in ^ 
reading: An experiment with kanji. Raskins Laboratories/Status Reporf ^ 
on Speech Research SR-33^ ; 137-156^ . ' ^ ^ ~ 

Furth, H. * (1966) Thinking without Language: Psychological Implications of 

Deafness . (New York: Th'^ Free Press). 
Gleitman,'' L. R. and P. Rozin. (1973) Teaching reading by use of a syllabary. 

|tad. Res. Quart. 8> 447-483i 
Goldsij^in', D. M. (1974) Learning "to read and developmental changes in covert 
)eech and in a word analysis and S}^ thesis skill. Unpublished Ph.D. 
fdissertation. University of Connecticut. 
Goodman, K. S. (1968) The psycholinguistic nature of the reading process.' In 
The Psycholinguiatic Nature of the Reading Process , ed. by K. S. Goodman • 
(Detroit: Wa3nne State University Press). ; 
Halwes, T» (1968) Comment.^ In Communicating by Language , ed. by J. F. 

Kavanagh*. (Bethesda, Md. :^ NICHD) , p. 160. 
Hintzman, D. L. (1967) Articulatory coding in sbort-tertn. memory. J. Verbal 
■ Learn. Verbal Befi[av> ^> 312-316. . 
Huey, E. B. j[1908) The Psychology and Pedogogy of Reading . (New Yorkf / 

Macmillan) . ^ 
Jasta:k, J., S. W. Bijou, and S. R. Jastalf,. (1965) Wide Range Achievement 

Test. (Wilmington, Del.: Guidance Associates) . ^ 
Kintsch, W. and H. Buschke. (1969) Homophones and S3monyms in short-term 

memory. J. Exp. Psychol. 80 , 403-407. ' ^ | ' 



Liberman, A. M. ,.F. S.^ Cooper, D. Shankweiler, and M. Stniddert-Kennedy. (1967) 
Perception of the speech code. Psychol. Rev. 74. 431-461. 

Liberman, A. M.', I. G. Mattijigly, and M. T. Turvey. (1972) Language codes 
and memory codes. In Coding Processes in Human Memory > ed. by A. W. 
Melton *and B. Martin. (Washington, D.C.: V. H. Winston & Sojis) . 

Liberman, I. Y. (1971) Basic research in speech and lateralization of. lan- 
guage: Some implications for reading xiisability. Bull. Orton Sbc. 21, 
71-87. ^ ' 

Liberman, I, Y. * (1973) Segmentation of the spoken word and reading acquisi- 
tion. Bull. Orton Soc. 23> 65-77. \/ 

Liberman, I. Y. , D. Shankweiler, F. W. Fisclier, and B. Carter. (1974) Reading 
and the awareifiess of linguistic segments. J. Exp. Child Psychol. 18, 201- 
2l2 . 

Liberman, T. Y^ D. Shankweiler , A. M. Liberman, C. Fowlet,^ and F. W. Fischer, 
(in press) Phonetic segmentation and receding irTttlfe'i Beginning reader. 
In Reading; Theory and Practice , ed. b^ A. S. Reber' ^nd D^Sc^rb^orpugb..^. 
.(Hillsdale, N. J.: Lawence Erlbaum Assoc.). 

Mattingly, I. G. (1972) — Reading: The linguistic process an4-iliiguistic 

awareness. In Language- by Ear and by Eye: The Relationships between ^ 
_Speech an d Reading , ed. by J. F. Kavan^gh and I. G. Mattingly. (Cambridge 
Mass. : MIT Press) . > , * 

Monroe, M. .(1932) Children^ Who Cannot/ Read . JCbicago: University of Chicago 
Press) . 

Polanyi, M. (1964) Personal Knowledge: TBwards a Post-Critical Philosophy . 

(New York: Harper &Row). ' i /V" 

Read,.C. (1971) Pre-school children' s* knowledge of English phonology. 

Harvard Educ. Rev. 41. 1-34. ' ri 

Rosner, J. and DJ P* Simon. (1971) The auditory anaiysl^ test: An initial 

report. J. Le^m. Pis. 4> 4Q-48., . / '-^ 

Rozin, P. and Ltv R. Gleitman. (in press) The structure and acquisition of 

reading. In Reading: Theory and Practice , ed. by. A. S. Reber and D. 

Sdarborough.. (Hillsdale, N. J.; Lawrence Erlbaum Assoei, ) . 
Savin, H. B. (1972) What the child knows. about speech vlien tie starts to learn 
V to read. In Language by Ear and by Eye:* The Relationships between Speech 

and Reading, ed.^by J. F. Kavanagh and I. G. Mattingly, (Cambridge. Mass.; 

MIT Press).- ' . ^ > 

Shankweiler, D. and I. Y. Liberman.^ (1972) Misreading^ / A search for causes. 

'^^ Language by Ear and by Eye :T The Relationships between Speech and ^ 

Reading , ed. Ay J. F. Kavanagh and I. G. Mfltt^-tngl^^! I (raU>.T>^Hgo Maeg ! * ^ 

MIT Press). ^ J.; 

Smith, F^ (1973) Psychollngaistics and Reading . .(New Yt>rki Holt, Rinehart & 

Winston). ^ • . ^ , ^ 

Sperling, G. (1963) A,.modiel for visual memory tasks. Human Factors 5^ 19-31. 



On Interpreting the Error Pattern in Beginning Reading 

Carol A. Fowler;* Isabella Y. Liberman,* and Donald Shankweiler* 

ABSTRACT 

The error pattern in beginning reading was examined from two 
perspectives: the. location of a misread consonant; or vowel segment 
within the syllable and the phonetic relationship between a consonant 
or vowel and a misreading of it. The first analysis showed, as 
earlier work had led us to expect, that consonants in the final posi- 

, tion in a syllable- were .more frequently misread than initial conso- 
nants. In contrast, the position of a vowel within the syllable had 

♦ no effect on the frequency with which it was misread* With'*'regard 
to the .second analysis, consonant errors were found to bear a close 
phonetic relationship to their target sounds, while errors on vowels 
were essentially unrelated, phonetically, to the vowel as writtenl 
The striking differences, demonstrated by the results of both analy- 
ses, between the consonants and the vowels were attributed to the . 
different linguistic functions of the two types of segments and to^ 
their different representations in English orthography. These find- 
ings underscore the impoirtance of nonvisual, language-related cogni- 
tive operations in reading acquisition. ' ^ 

By analyzing the errors that children make when they read, we can expect to 
learn something about the underlying difficulties of reading acquisition. How- 
ever, analysis of beginners' errors can be enlightening only if the errors form 
patterns, and then only if we can make sense of ^ the patterns in terms of what we 
know about those processes of language and perception oti which the developmertt 
of reading must depend. Of course, patterns do not reveal themselves automatf^ 
cally. Suitable strategies for examining the errors must be chose^n by the in- 
vestigator, and these naturally reflect one's views of the nature of the prob- 
lem. That is to say, the choice of strategies for analysis of misreadings re- 
flects our expectations and biases concerning what it is that makes learning to 
read difficult. • ' . , 

It seems patent to us that many children who lag behind in reading acquisi- 
tion do not understand the nature of the link between the writing system and the 
language they already command in speech.^ Our research has therefore been direct 
ed to the problems the child encounters in mapping the letter signs of the 
written word to the linguistic segments^of the spoken word. For this purpose, 
we have chosen to focus on the child's err6l|pattefn in reading isolated words 
rather than his reading of wgrds in c6hnect* text. Our major reason f or ^ 



*Also University of Connecticut, Storrs., 

[RASKINS LABpRAToilES: Status Report on Speech Research SR-45/46 (1976)1 



' • • 17 



adopting this approach Was a practical one: it is more feasible to assess a 
child* 8 analytic knowledge of the writing system when the materials used are as 
free as they can be from the .contextual cues supplied by ordinary meaningful 
disofourse. Empirical support for the validity of this approach is provided by 
earlier studies (Shankweiler and liiberman, *1972) in which we found a high cor- 
relation between children's ability to decode isolated words and their ability 
to read meaningful, connected text with comprehension. 

Given the word as the unit for investigation, our strategy was to examine ' 
the beginner's misreadings from two perspective^: the location within the word 
where errors most frequently occur, and the i/honetic relationships between the 
word as written and the child's incorrect ren^itioius. 

PHONOLOGICAi. SEGMENTATION AND ERRORS IN BEGINNING READING 

The first perspective was suggested to us by the results of an earlier ex- 
periment (Shankweiler and Liberman, 1972). In that exp^eriment, we. observed that 
the errors made by beginning readers did, in fact, show a pattern with respect « 
to location within^the word. Thus we noted, as others had (Daniels and, Diack, 
1956; Weber, 1970), that errors on final consonants far exceed those on initial 
consonants in a consonant-vowel-consontnt (CVC) syllable. Additionally, we 
found that errors on medial vowels exceed errors on consonants in both the ini- 
tial and final positions. ^ • * 

To account for this observed distribution of errors, we adopted a ^line of 
reasoning previously suggested by one of us (Libermah, 1971, 1973) in which it 
was argued that if the child is to take full advantage of an alphabetic writing 
system, he must be able to segment the ppoken word into its component phonologi- 
cal units. That is to say, he first has to r^co^gnize that the continuous 
acoustic signal that constitutes the spoken word may be represented as an 
ordered string of discrete phonolo^cal segments. Second, the child must be 
able to identify explicitly the set of phonological segments that makes up a 
given word. Only by so doing can he acquire and use the orthographic rules that 
map these abstract units of sound onto their appropriate graphic representations 
It is not enough that the child merely be able to discriminate words, such as 
bag and bat , wfiich differ in one phoneme. Every normal child can do that -long 
before attaining reading age. In order to learn to use an alphabet effectively, 
more is required than the perception of phonological differences. The child 
needs to know explicitly that, in the example given, the words each contain 
three segments and that they are alike in the first and second segments and , 
differ in the third (cf. Gibson and Levin, 1975 and Rozin and Gleitman, in 
press, for extended discussions of the View) . 

Several recent investigations (Rosner and Simon, 1970; Calfee, Chapman, and 
Venezky, 1972; Liberman, 1973; Liberman, Shankweiler, Fischer, and Carter, 1974) 
of the phonological skills of young children havfe shox^^that many do inde-ed find 
the task hi segmenting the spoken word a difficult one. In our study (Liberman 
*et ^1.^ 1974), children in thi^ee age groups (uursery-^preschool, kindergarten, 
and first grade) were asked to indicate, by means of a tapping game we showed * 
them, th6 number of phonemes contained in each of a group of high-*f requency 
words* Most of the youngest chi^^dren were unable^ to perform the task as were 
the majority of thfe. kindergarteners. Even at the end of the first grade, 30 
percent Of the children failed. The first-grade children who failed in the 

18 • . 



i 



\ 



segmentation task had considerably more difficulty later in reading acquisition 
than those who succeeded (Liberman, 1973). , ^ . 

In the light of these findings, it seemed reasonable to suppose that the 
task of phonological segmentation might also vary in difficulty with the posi- 
tion of a given segment in the syllable. That , is, the initial sound in a sylla- 
ble should be easiest to isolate for the purpose of relating sound to ortho- 
graphic representation because it can be extracted without extensive analysis of 
the syllable's sound structure. Obversely, the final segment would be more dif- 
ficult because just such an analysis would be required. The medial sound might 
be the most difficult to analyze because it is entirely embedded within the syl- 
lable. A report by Rosner and Simon (1970) seems to support these conjectures: 
when a child is asked to reproduce an auditorily presented word, but to leave 
out a specified consonant sound, he experiences the greatest difficulty with the 
medial consonant sound and* the least diffi'curty with the initial sound. 

One way to account for the error pattern observed in our earlier experiment, 
then, is to consider that it reflects the differential difficulty that the be- 
ginning reader experiences in segmenting sounds occurring in tha initial', medial, 
and final positions in the syllable". Such an account would attribute the error 
difference we obtained between medial vowels, final consonants, and initial con- 
sonants to the relative positions within the syllable 'occupied by the different 
types of sounds and not to differences among the sound-tiypes themselves. 

Although the data of our previous experiment (Shankweiler and Liberman, 
1-972) are consistent with such an interpretation, controls were lacking that 
would ei\able us to rule out other possible interpretations. An adequate test of 
the hypothesis would require first that the set of consonants occurring in syl- . 
lable-initial position be identical with the set that occurs in syllable-final 
position. Additionally, it would require that the vowel also occur in initial 
and final position, not only in the medial position, as was the case in our 
earlier experiment. If , in a test designed to incorporate these controls, 
errors on initial, medial, and final segments sfgain rank as before, then' we can ^ 
conclude with more assurance that the order of difficulty reflects a true p<5si- 
tion effect for both consonants and vowels. ■ ' ^ 

Accordingly, for the present experiment, wfe developed two word lists. de- 
signed to meet these requirements.-^ In List 1, the 19 consonant phonemes that 
can occur in both the initial and in the final positions of a word appeare* 
twice in each position. ^ in List 2, the seven vowel phonemes that can occur in 
the initial, medial, and final.^ segment positions in a monosyllafile appeared 
. three ^times in each position. The items composing both the wowel and the con- 
sonant lists were monosyllabic words, s which insofar as possible were familiar to 
third-graders (BucMngham and Dolch, 1936) but were not "sight" words. 



"Ideally, it would have b^eh desirable to provide bath the consonant and the vow- 
el controls within one list. Contingencies relating to reading and vocabulary 
level made this impossible to achieve. 

'Medial consonants were excluded from the test list, as they, had been in the earl- 
ier experiment. Their inclusion would restrict us to a very small set of conso- 
nants unless we'allowed 'disyllables. Disyllables were avoided because we dfd not 
VJfish to introduce problems of syllable segmentation into the .reading task. 

/ . ' ' ' ' ' ^ ^ 19 



The J.ists ^ere presented ixi a single session,^ The order of list presenta- 
tion was balanced across subject^, and the order of wojrds in each list wa^ ran- 
domized. The test words were* printed with a black felt-tip pen on s^pajt^e un- 
lined 3x5 fUe card^. The ca^^s were placed face doym in front, df the subject 
and were ^turned over one by otle by the examiner. The i3ubject was asked to read 
each yord as it was presented and to give his best gueiss if he did not know the 
word. Responses were phcmemically transcribed by the (examiner an<J were recorded 
on magnet: ic t^^e for ^er checking. 



The subjects^e^re children of the second, third, and fourth grades, 20, from 
each grade, chosen alphabetically from the rosters of jnale and female sttjdents 
in a public elementary school in Andover, Connec.tix:ut. , Testing was don^ 
late fall and early winter. 



in the 



THE SEGMENT POSITION EFFECT IN CONSONANT ERRORS / 

The distribution of iphoneme frequencies in English is not the same in sylla- 
ble-initial and in syllable- f^nal segment positions. In order to control for 
possible effects of this difference. List 1 was construc^tec! so that the same set 
of consonant phonemes app^hred in each position. De^ite this, control, the 
error difference obtained in our earlier experiments. was replicated. As can be 
seen in Table 1, final-Consonant (FC) errors continue to exceed initial-conso- 
nant (IC) errors* The direction of the difference/is the same at eVery grade 
level, and is consistent with the predicted rank ^rdering of difficulty of the 
initial. and final segments in the syllable. 



TABLE 1: Errors on initial and final cons onaVits • and on medial vowels (List l) 
presented as proportions of opportunity for error (decimal points^ 
omitted). ^ ^ 



Grade 

3 
4 





* 




IC 


FC 


Mva. 


08 


/ 16 


27 


05 / 


10 


15 


02/ 


06 


08 



Occurrences not controlled. 



An analysis of variance^ perf opied on the data indicated that the effect, of 

consonant ^sition was highly significant tF(l,57) = 44.80, £ < .001]. ^As ex- 

pectedy there was also an increase in performance level with grade [F(2,57y « 

4.10, £ < .t)25]. The grade-by-position interaction was not significant. 
♦ 

4 4 / 

Alt^hough the identity of the phonemes occurring in each segment g^qsltion of 
the words was controlled in/List 1, their orthographic representations were not 
controlled. Therefore, a further analysis was performed to ascertain that the 
larger FC error rates could not^ be ascribed to differences in the frequency or * 
ease o*f apprehension of the different sets of o/thographic representations that 



3 ^ ' • ^'^ ' 

A third list, used to study orthographic complexity, was^ presented at* the same 

time* It will b'e described in a I'ater paper. • 



' ^ ■ ■ ■ ' ' p > 

occur in the initial and final positions. For the purposes of this analysis, 
orthographic complexity was defined in two ways. First, it was defined in terms 
^f the number of possible orthographic representatipns per phoneme. In this 
sense, a phoneme that can be Spelled in many ways is^more complex than one with 
few orthographic representations. Second, complexity was defii>ed in terms of 
the number of letters in each orthographic representation. For example, "tch" 
would be more complex than "c." For the purposes of the folloK^ing analysis, 
both criteria were used—that is, a phoneme wa^^.considered orthographically com-, 
plex if- it could be spelled in more than one way, but it also was considered com- 
plex if its single orthographic frepresentation 'consisted of more than one letter. 
Based on these criteria^ the consonant phonemes were separated into* "simple" and 
"complex" categories. * •* * ^^^^^ 

\ 

In Table 2, IC and FC errors in the sitijiple and comple?c^categories ^re ^pire- 
sented as proportions of opportunities for error. If orthographic complexitfy 
were the basis for the FC/IC difference, removing these phonein|as ,bn whic^ /PCs 
and ICs differ with respect to orthographic complexity ^^hould equalize tHe epror 
rates. However, the difference is present even intthe "simple" 'category whoSe 
member phonemes are simple both in syllable-initial and syllable- final .position . 
with respect to the indicated criteria. '\ > 

. ' ' ■ ■ •_ tS— 

TABLE 2: jfertors on orthographically complex arid simple sounds. , Errors pre- 
sented as proportions of opportunity fot error (decimal points 
omitted). . 



Grade 


IC ■■ 


FC 




FC 


2 


09, • . 


24 ' 


'06 


08 


> 

.3 . . 


06 


13 


■ 03 


07 


4 


. 


09 


':■ 01 


• 03 



^Complex: /f , j ,k,in,s-,e,S, c,S,z7; •simple:' /b,d,g,l,n,p,t,r,v/. ^. 

^Apparently then, neither phonemic distribution within the syllable nor ot-- «> 
thographic complexity can account for the FC/IC difference in error rate; The 
difference, therefore; must be truly a position effect, that is, an effect of 
the location of d given phoneme in the syllable. Ffnal-consonant segmerits.are 
more difficult than IC segments because they are in the syllable-final position. 

THE SEGMENT POSITION EFFECT IN VOWEL ERRQlRS ' - 



It ^an" be ^en from Table 3, which display^-* the error ^ scores for the 
vowel-controlled list of words, that the vow'els do .not show the marked -positit^n 
effect of the consonants. The analysis of variance revealed only a marginally 
significant effect of segment position tF(2,114) = 4.61,2 < -OS]. Again, there* 
wa,s an increase in performs^nce level with grade [F(2,114) = 11.08, £ < .OIJ, and 
the interaction was nonsignificant. ' / • 

Analyses performed separately on the^ error scores for each grade show that 
the position effect* for vowels was statistically significant at one grade l^vel: 
the fourth grade. This is in contrast to the position effect for consonants, 



\ 



TABLE 3: Errors dn initial, medial, and final vowels and on initial And final 

consonants (Listt 2) predated as'ptoportions. of opportunity for error * 
* (decim al po i nts omitted) . , - . * 



Gr,ade 




MV 


FV 




FC^ 


2 ■ 


.'■ 47 


43 


43 


17 


32 


3 


28 


27 


31 


09^ 


" 19 


4 


20 


12 


19 


04 


11 



a ' . ■ 

, Occ\>ri^ces not controlled. 



which wks significant in all three grades. Post-hoc means tests of the fourth- 
grade vowel data indicate that two differences accounted for the significant F 
values: errors on vowels in the initial position ajidTln the initial and final 
positions combined, both significantly exceeded erro/s on vowels' in the medial 
position. ^Thus", if a segment position effect for vowels can be said to exist at. 
all, it jnju^t be- attributed tp the significantly fewey errors on medial than on 
initial an^ final vowels .(and then only^for the fourth-grade subjects). 

THE. RANK ORDERING OF CONSONANT ERRORS AND VOWEL ERRORS 

We can pow reexamine the vowel>final-consonant>initial-consonant rank 
ordering of errors that we observed in our original experiment. It shoul(i first 
be noted that' because boththe consonants and vowels could not be controlled^ 
within a single list, the consonant-vowel error hierarchy cannot be directly , 
examined within either List 1 or 2^ Howevet, as can be seen in Tables ,1 and 3,, 
if vowel errors are scored in the consonant-controlled list and consonant errors 
in 'the vowel-controlled list ^ ^ the voVel>f inal-consonant>initial-consonant hier- 
archy of error frequency is replicated at every grade level within both lists. 
It 1^ clear that whereas vowels in any position elicit more errors than conso- 
nants, the initial- final difference among thtf consonants is maintained.. 

On the consonant-controlled list of the present' experime^nt, the difference 
in error rate between the final consonants and the initial consonants found 
earlier was replicated even after phonological and orthographic differences be- 
tween. the two categories had been removed. The discrepancy, then, may be attri- 
buted to some difference in (difficulty between the ii\ltlal and final seg^ient pp- 
sitions of the corjsofiants in the syllable and not to the particular consonant 
phonemes or the orthographic patters that tend to occur in the two syllabic lo- 
cations.' ^ ' ' 

"I . * • . ■ ■ . ■ ; ' 

On the other hand, the preponderance of vowel ovfer consonant errors oj^tained 
in our earlier ^expeiriment can no longer be attributed to the embedded position of 
the vowel within *the* syllable. The results obtained with List 2. indicate that 
vowels ar^ approximat-ely equal in difficulty across the three syllable, locations. 
We may conclude, therefore, that the vowels in our earlier laxpetiment were more 
difficult than the consonants for the beginning readers, not because of their 
embedded location within tjie syllable, but, rather, because of characteristics 
specific to vowels and not present in consonants. 

' ' * ; : . 

In summajry, we have looked to see wUf^re the errors are made in the syllable, 
.and have concluded that there is a position effect £6t the Consonants* 



Syllable- final consonants give rise to twice as many errors as syllabler initial 
consonants. The position-related e'rrors can therefore viewed as an outco;ne 
of the difficulties of phonological segmentation. However, the frequency 'of 
vow,el errors was not affected by the position of the vowel segment within* the 
syllable.. Therefore, we cannot regard "the* child's difficulties with vowels as 
-a reflection of his' inability to segment the syllable." , * , 

It may be argued (Liberman, 1973) that if the child's segmentation skilj^ 
were improved, his diff icultl^es* with the »voV!rels' would not be a severe handicap 
to him, in deciphering the te;ct.v,rr^is might be* expected because 'the consc^nants 
carxy the major information load in the word. If ^he child were 9i>le tp assign 
correct sounds to* the consonants in proper sequence, an incorrect rendition of 
the vowels would be corrected fairly easily in context. 

- THE NATURE OF THE PHONETIC ERRORS IN BEGINNING READING 

Having^ considered the location of the errors, we turn our attention now to . 
an examination of their natute. We found both in this experime^it?€nd^ previous 
one (Shankweiler and Liberman, 1972) that vowels generate more errors, th^in con- 
*sonantSk It is appropriate to ask how the errors might be different in the two . 
phonetic classes. Our purpose ^.in the following analysis was to look for phonetic 
relationships between the misread segment and the target segment. Of course, in 
ordinary reading the lexical and broader linguistic context may affect the choice 
of th^ guessed-at word. We deliberately minisiized the contribution df context , 
as we have said, in order to.be able to assign^ a trel§tively unambigij^us inter- 
pretation to the errors that' oceur. . ' 

^ Because 'the experiment required the children to read the* words aloud ^ all 
of them presumably had to make a transformation from a visual to a phonetic 
representation. ,W^ may be surfe then that the child is recading the material 
phonetically aa he reads, and we can examine, segment by segment, the phonetic , 
relationshij) between the child's misreading and the segment that woul*ci be' pro- 
duced in that position if the word were read as wriLten. In order to make the 
examination, we have adapted -technigues used by other investigators to examine 
errors of speech perception. There is much evidence from investigations of speech 
perception .(see, for example. Miller and Nicely, 1955) that phoneme segments "are 
themselves compounds of a small set of* phonetic features and that errors in per- 
ceiving speech by ea^ can be understood on a' feature basis. That is, a substi- 
tuted phoneme, more often than not, is' only a partial error, in the sense^ that 
it preserves features in common with the presented segment. 

Recent ^ata obtained by Eimas (In press) show that the pattern of consonant 
erroVs-made t>y six- and seven-year-old children in recall of strings of visually 
presented, ^noiisense^syllables resembles extremely closely the pattern obtained 
with auditory presentation. Errors having more than one distinctive feature in 
common with tT^e presented phoneme occurred significantly more frequently than 
errors sharing one or no features With the presented phbneme. These findings 
would lead us to expect that as the child reads, h6 recodes the input into a 
form that can be described in terms of a phonetic feature matrix* 

If errors arise in the transformation from print to a phonetic code, then 
the pattern of errors due to misreading might be expected to resemble that due 
to mishearing. Thus, there is. reason to expect that the frequency of misreading 

- \ . * 23 



9^, 




would T^ary directly with the number of features shared "between the presented and 
the misread segments. Factters other than degree of phonetic contrast, however, 
are likely to be involved in the tdisr^adihgs of vowels. Whereaa the , rules re-! 
lating spelling to phonetic segment are i?elatively straight forw.ard' for Qonso- 
nants, they a^e qu^te complex for vowels t For this reason, we migjit expect 
find not only that more errors occur on vowels than on consonants, but aj,*«5 that 
the nature of the substitutions may be different for the two phonetic classes. 

•v * • 

/ ' . FEATURE SUBSTITUT ION ERRORS AMONG CONSONANTS 

y : — : — : ^ • ' ^ 

To determine whether the misreadings among • consonants pattern nonrandomly, 
we needed a way to qu§ntify the phonetic distance between any two consonants. 
We alsQ^ needed a way of comparing the observed frequency of errots at a given 
phonetic distance from a target phCneme with the frequencies that would be ex- 
pected if the children* were randomly assigning phonemes to letters. 

For ^he purposes of this investigation, we defined phonetic distance in 
term? of the number of distinctive features shared by an error response and a * 
target ^phoneme. Three features — voicii^, .place of articulation, and manner of 
articulation-^describe the English consonants adequately, providing each with, a 
unique feature description. For example, since /b/ and /p/ share two features, 
they are considered 'phonetically similar; /b/ and /s/,. which share no features, 
are dissimilar. Each' error response was classified in this manner, according to 
th^ number of features it shared with its respective target phoneme. The' fr*e- 
^quency of error responQes in each of .the phonetic-distance categories (zero, 
one,^ or twa features shared) was tallied separately for children of each grade 
and for each consonant position. 

Frequencies expected by »chance were calculated by constructing •a 19 x 22 
triangular matrix with the 19 target phonemes (that Is, the 19 consonants that 
appeared in List 1) represented vertically and the complete set of the 22 conso- 
nants of English represented horizontally. Each cell of the triangular matrix 
thijs urfiquely represented a target phoneme paired with a possible error response. 
We made the assumption that a .child responding randomly would choose his re- 
sponses only from among. the set of English consonants. In each cell were listed 

*the features shared in common by the appropriate target phoneme and error re- 
sponse. The frequencies of cells with, entries, containing zero, one, or two fea- 
tures shared by the target consonant And each possible erroneous response were 
tallied separately. These were expressed as proportions of errors that would be 
expected to share zero, one, or two features with the target phoneme if the 

^children were assigning phoneme categories to letters on a random- basis. The 
total number of errors for each grade and consonant position was mul'tiplied by 
each proportion, thus providing an estimate of the .pumber of errors expected to 
fall i;ito each phonetic distance category under the assumption <$,f randomness, 
jrhese expected frequencies were statistically compared with the obtained fre- 

l ftuencies using the analysis,. Table 4 presents the obtained and expected fre- 

: ^uencies and the value of x^by grade^ and consonant jposition.-^. 



4 ' f . 

We are aware that the analyses presented in Tables. 4 and 5 below, violate the 
independence assumption of the x^ analysis. Consequently, iae cattinot draw our 
conclusions from* the results of the analysis with any certainty. However, we 
know ^f no more appropriate analysis. 

24 • . * . . . 



23 



r 

TABLE 4: Observed and expected frequencies of consonant errors sharing zero, 
one, or two features with the target sounds.^ 

Number of shared features 
Grade 0 • 1 2 ' 

ObsaSsei>4xpected Observed Expected Observed* Expected £ 

2 11 44 . 43 65 81 28 132.5 <.001 

3 8 25.1 22 37.2 47^ . 16.8 ' 72.2 <.001 

4 3 12.4 13 L9.6 26 10.5 32.2 <.001 . ' 

Our expectation that the child's errors would be^ governed by phonetic; fea- 
tures appears to be strongly supported by the consonant data. As can be seen in 
Table 4, the values for each grade and consonant position are jsighUficant, 
with £ < .001. " ' ^ ^ \ 

The proportion o£/consonant errors falling into the two- feature-shared cat- 
egory is remarkably stable across the grades: 60 percent of second-grade ferrors, 
61 {jercent of third-grade errofs, and 62 percent of fourth-grade errors share 
two features with their appropriate target phonemes. The results suggest, 
'therefore, that phonetically motivated substitutions contribute substantially to , 
the consonant error pattern both at the very early stages of reading acquisition 
and beyond^ ^ , ' 

FEATURE S UBSTITUiaQN ERRORS AMONG .VOWELS 

-« '■ ' • *• , ^ 

Vowel errors were treated in much the same way as the consonant errors.* A 
nuii±)er of feature systems for vowels has l^een proposed, but none has won such 
strong empirical support as to give a clear basis for choice. The feature sys- 
tem we used was a, modi^cation of tliat proposed by Singh and Woods (1971). 
Three of their features^^enseness, tongue advancement, and tongue height — dis- 
tinguish each nondiphthongized vowel from every other. A fourth feature, Retro- 
flexion, distinguishes "bnly the vowel /5V from^other. vowels. Since /t/ is an 
infrequent respohse in our data,. we did not incorporate this feature in our 
analysis. In its place,. we added the feature diphthongization, in order 'to 
distinguish dipl^thongized ffbm nondiphthongized vowels* 

The vowel errors, likfe. the consonant errors, ware classified according to 
the number of features tbey shared with their respec'Hive target phoneme^. The 
frequency of err^s, in each phonetic distance category (^ero, one, two, or three 
features shared' yith the target) was again compared with * the frequencies that 
'would be expected if the child were randomly assigning' phonemes to spellings. 

♦ 

The results^ of the vowel feature analysis, shown in^ Table 5, reveal a pic- ^» 
ture very diffetent -from the comparable analysis of consonant errors. The 
Table gives^ grouped frequencies of errors on the vowel classified according, to 
the aumber of features shared with the target/Vowel. Again, expected frequen- 
cies are 'calculated on the null assumption that the distribution- of errors with- 
in these categories is random. Whereas for consonants .the effect of phonetic 
distance was significant across all grades, the vowel -errors displayed in j 
Table 5 reveal no consistent direction in the differences between observed and 

25 



/ 



expected frequencies. Thus, for vowels,, it appears that given the occurrence 
of an eri^or, the assignment of phoheine -to grapheme waa random. 

\ A ' ■ • ' 

TABLE 5: Observed and expected frequencies of vowel errors sharing zero or one 
feature, or sharing two. or three features with the target sounds. 

Niiinlff'er of shared features ^ 
Grade 0-1^ * 2-3 

Observed Expected Observed Expected x2 £ 

2 285 ' ,394 262 243 ^ 2;67 >.10 

3 - 180 . 1.91 170 158 ' 1.54 >.20 
^ 110 , 118 105 96 1.38 >.20 

The contrasting result^ obtained for^ vowels and consonants is indeed strik- 
ing. The opposition of these phonetic classes is revealed by both approaches to 
error analysis: the first, in which we investigated misreadings in relat^ion to 
their location in the syllable, and the second, in which we consider t\ie phonetic 
characteristics of the errors. From the latter analysis, we are led to conclude 
that the concept of degree of phonetic contrast, so successful in rationalizing 
the errors on consonants, does not enable us to understand the vowel errors. 
For these, other sources of difficulty must be sought. 

At all events, these, differences^ in error pattern between the consonants 
and vowels lend credibility to the position taken by o.urselves^and other investi- 
gators (Liberman, Shankweiler, Orlando, Harris, and Bell-Berti, 1971; Vellutino, 
StfegerT and Kandel, 1972; Vellutitio, Pruzak, Steger, and Meshoulam, 1973) that 
visual factors are not sufficient to account for the difficulties of the begin- 
ning reader. Surely, problems in scanning, eye movements ^ and/or the apprehen- 
sion of the optical form of letters cannot explain the differences in consonant 
and vowel etror patterns that we. have found. Consopants and vowels cannot be 
meaningfully classified in terms of their visual chartfct:eristics; the differ- 
ences in error pattern therefore' could not be related to a classification made 
on that basig. Consonants and vowels do, on the other hand, form distinctive 
categories in the language and have different functional roles in coinmunication. 

tonsidferred from the standpoint -of their contribution to the phonological 
message, consonants carry the heavier information lt>ad.. Vowels, pn the other 
hand, are the foundation on which the syllable is constructed, and as such are 
the' carriers of prosodic features. It is the vowels that are the more fluid 
and variable of the two classes of phonetic elements, more subject to phonetic 
variation across individuals and dialect groups, and more subject to phonetic 
drift over time. As we. suggested in an earlier paper (Shankweiler and Liberman, 
1972), the relatively greater variability of vowels than consonants may in part 
account for the different Ways these segments are represeatfed in the ortho- 
graphy. It may account for the f act *that in English, at least, there tend to 
be many spellings for eaqh vowel and more nearly one-to-one spelling-to-sound 
relationships for the consonants. , - 



26 



SUMMARY AND CONCLUSIONS ^ 

The ertors children make in readifig, before they have fully mastered the 
skill, can teach us something about the special problems of learning to read. 
In an earlier study, we observed, as, others have, that errors on the final con- 
sonant of a CVC syllable far ,^ceed those on the i^nitial consonant. Addition- 
ally, we found that errors .on jnedial vowels exceed those on consonants in both 
initial and final pdsition. xhe first purpose of the present , study Was to con-^ 
firm these earlier findings and, ^ by the use of various controls, to test their 
generality. ' * ^ : 

.We found the same pattern of consonant errors as previously obtained, ^with 
' those in final position being misread twice as often as those in initial posi- 
tion. As a result of the controls introduced in the present study, we can now 
conclude that the findings represent a true position effect. It cannd'fcsbe 
attributed. to .a different phonological* distribution of consonants in syliki^e- 
initial and in syllable-final position, nor can it be attributed ^to differences ■ 
in the orthography associated with beginnings and ends of words. Having ruled 
*out these interpretations of the position effect, we believe the greater diffi- 
culty of the final' consonant is the result of the child's defective understand- 
ing of tbe phonological segmentation of his spoken language. We know from 
earlier work of our own and others that inability to indicate the phonemic seg- 
mentation of heard speech is characteristic of the prereading child. Given the 
difficulty in becoming explicitly aware that syllables may be analyzed^nto 
strings of ^phonological segments, it seemed reasonable to ^uppose that the task 
of phonological segmentation might vary in difficulty with the position of a 
given segment in the syllable. On this basis, the Initial segment should be 
easiest to isolate because.it can be extracted without analysis of the internal 
structure of the syllable. 

In contrast to the findings on consonant misreadings, errors ^on vqwels show 
no effect of position. When we placed the vowel in initial, medial, and final 
position in the syllable, the errors did not vary in any systematic. fashion. We 
suppose, therefore, that vowel errors do not reflect .primarily the cKild'-s dif- 
ficulties in phonological segmentation, but rather the complexity and variabil- 
ity of the spelling- to-sound correspondences. 

The assumption that consonant and vowel errors Jiave different causes was 
supported by the results of a further analysis that took account not of the lo- 
cation of the errors, but of their nature. In that analysis,* it was found that 
consonant errors we^ systematically related to the presented phoneme, differing 
from it most often inVonly one feature. Vowel errors, in cont^rast, were not 
systematically related to the phonetic features of the presented vowel; indeed, 
the feature distribution of the vowel errors was essentially random. Such dif- 
ferences in the distribiition of errors on consonants and vowels ijfi reading. may 
reflect the 'different functions of those phonetic classes in speech. Perhaps 
the most general dmplication of these differences in error pattern between cpn- 
sonants and vowels is that they underscore the importance of nonvisual cognitive 
processes in reading. These findings lend confirmation to our belief that vis- 
ual factors contribute rather little to the difficulties of beginning r^adin^ — 
certainly, less than faQtors relating to the language, such as awareness of pho- 
nological segmentation, phonetic recoding, and the structure of the orthography. 

. ' 27 



ERIC 



PREFERENCES 

Buckingham, B. and.E. Dolch* (1936) A Combined Wgrd Lis,t. . (Boston: Ginn & 

. . Co.)» ' ' ' 

Calfee; t.^ R. Chapman, and R. Venezky. (1972) How a child needs to think to 

learn to read. In Cognition in Learning and Membrv « ed. by L. W. Gregg'. * 

(New York: Wiley). * • ' 
Daniels, J. C. and H. Diack. (1956) Progress in Reading . . (Nottingham, England: 

University of Nottingham). 
E^mas, P. D. (in press) Distinctive— feature' codes in' the short-term memory of 

children. J. Exp, dhild Pavchol. > . - 
Gibson, *E. and H. Levin. (1975) The PsycHology of Reading . (Cambridge, Mass.: 

MIT Press) . . 

Liberman, I. (1971) Basic research in speech and lateralization of language: 
Some implications for. reading disability. Bull^ Or ton goc. 21 , 71-87^ 

Liberman, I. (1973) Segmentation of the spoken word and reading acquisition. 
Bull. Orton Soc. 23 , 65-77. . ... 

Liberman, I., D. Shankweiler, F. W. Fischer, and- B. Carter. (1974) Reading and 
the awareness of linguistic segments.. J, Exp. Child Psychol. 18, 201-212. 

Libermapf^ I., D. Shankweiler, C. Orlando, K. Harris, and F. Bell-Beorti. (1971) 
Letter confusions and reversals of sequence in the beginning reader: Im- 
x^lioations for Orton's theory of developmental^ dyslexia. Cortex 7, 127-142. 

Miller, G. and P. Nicely. (1955) An analysis of perceptual confusions among 
the English consonants. - J. Acoust. Soc. Am. 27 , 338-352., 

Rosner,' J. and D.' P. Simon. (1970) The Auditory Analysis Test: An Initial 
Report . (Pittsburgh: University of Pittsburgh Learning Research and 
Development Center). * 

Rozin, P. and L. Gleitman. (in press) The structure and acquisition of' read- 
ing. In Reading: Theory .and Practice , ed. by A. S. Reber and D. 
Scarborough^ (Hillsdale, N. J. : Lawrence Erlbaum Assoc.). * 

Shankweilei|, D. and I. Liberman. (1972) Misreading: A search for causes. In 
Language by Ear and by Eye: The Relationships between Speech and Reading , 
ed. by J. F. Kavanagh and I. G. Mattingly. (Cambridge, Mass.: MIT, Press). 

Singh, S.' and D. R. Woods. (1971) Perceptual structure of 12 American. English 
vo,wels. - J. Acoust. Soc. Am. 49 , 1861-1866. . ^ ^ ^ 

Vellutl.no, , R. Pruzak, J. Steger, and U. Meshoulam. , (1973) Immediate visu^ 
recall in poor and normal readers as a function of orthographic-linguistfc 
familiarity. Cortex 8, 106-118^. ^ 

Vellutino, F. , J. Steger,* and G. Kan4el. (1972) Reading disability: An in- 
vestigation of the perceptual deficit hypothesis. Cortex 9, 370-386.. 

Weber, R. (1970) A ling^uistic analysis of fir^t-grade reading ferrors. Jlead.- 
Res. Quart. 5, 42^-451. 



28 



ERIC 



33 



1 

Comments on the Sessian: Pearception and Production of Speech II; Conference on 
Origin ,and Evolution of Language and Speech* . * . *~ 

A. M, Liberman 



The interesting papers we heard all 'dealt in one way or another with a 
question that is surely central to an inquiry into .the biology* of language: Are 
linguistic processes in sode s'ense special, different, from the processes that 
und^jj^ife nonlinguistic activities and, perhaps, unique to man? To discuss that 
quesliibn, and the papers of the evening's, session, I fipd it useful to distin- 
guish two classes of specialized processes, auditory and phonetic. . 

Specialized auditory processes would serve, perhaps in the fashion of fea- 
ture detectors, to extract* those aspects of the acoustic signal that carry the 
important information. One is !^ed to suppose that such devices might exist be- 
cause it is true, and paradoxical, that some of the most important phonetic in- 
formation is contained in parts of the speech sound that are -not physically 
salient. Thus., a significant acoustic cue is in the formant transitions, though 
these are often of short duration and rapidly changing frequency. Perhaps there 
are. devices devoted to detecting those transitions. If so, we should hold them 
up as examples of specializations in t^ auditory systeifr. They would be impor- 
tant for the perception of language, but not properly part of its special pro- 
cesses* • ' " ^ 

» « - 

If the acoustic signal* were directly related to 'the phonetic message, then 
detection of th^ phonetically important cues would be sufficient for phonetic 
perception; no 'further processing would be necessary. But the relation between 
sjlgnaL and m46sage is peculiarly complex. [For summary accounts, see Fant ^ 
(1962); Cooper (1972); Stevens and House (1972); Liberman (1974); ^tuddert- ' 
Kennedy (1974).] 'As a result, the specialized* auditory detectors can only be^tn 
the job; the auditpry display, they produce* uluSt still be interpreted, because 
the phonetic message is there in such highly encoded form. If there ^are * devices 
specialized to do that kind of ''interpreting, then I should consider them pho- 
netic, not auditory. Since I will organize my comments on the papers of the 
evening ip. terms af that distinction, 1 should take a moment to illustrate what 
I mean. 

Consider the formant transitions that are important cues for the perception 
of stop consonants In syllable- initial position, and call up in your mind's eye 
spectrographic representations- (similar to those shown by Dr., Morse) of such 
transition cues as would be appropriate for tda] and [ba] . Now add a patch of 



^P^er deliverei?l at the New York Academy of Sciences, 22-25 September 1975. 
"^Also University of Connecticut, Storrs, and Y,ale University, New Haven, Coi 
[HASKINS LABORATORIES: , Status Report on Speech Research. SR-45/46 (f976)] 



29 



fricative noise — the hiss of [s] — just before the [da] • , If that patch is imme- 
diately in front 'of the [da], you will hear [sa], not [da]; the stop will have 
disappeared completely. But if the patch is moved away qo as to leave about 50 
msec of silence between the^ -end of the hiss and the beginning of the formant 
transitions, theia you will hear .[sta]; that is, you will hear the stop once 
ag^in. The generalization that captures those facts, and many otl\ers closely 
related^ to them, is that a necessary condition for the perception of syllable- 
initial stop consonants is a brief period of silence in front of the appropriate 
transition cues. But why should silence be necessary? Why should it be impos- 
sible to hear the stop when its acoustic cues follow closely on the f tlcaLi<rS' 
noise?» ^ * . > . 

The simplest explanation, surely, is that we are here dealing with a char- 
acjtieristic of the generalized mammalian auditory system. That might seem rea- 
sonable if only because" in putting the fricative noise in front of the transi- 
tion cues Wj^ have conformed to the paradigm for auditory forward masking. But a 
search of the literature on such masking uncovers no reasoii to suppose that it 
could, in fact, provide the account we seek; forward masking does occur, but it 
is not nearly so strong as to produce the tptal disappearai^ce of the stop conso- 
nant in [sa]. [See, for example, Elliott (1971) and Leshowitz and Cudahy 
(1973).] • ' , . . ' 

Consider, now, a second interpretation. Suppose there are transition de- 
tecto% of the kind I speculated about and suppose, further, that the fricativl 
noise disables them, rendering them ineffective in extracting the transition 
cues for the stop donso^nant. In fact, there is ^ery indirect evidence that such 
transition detectors ma^ exist in man. Thus, work by Kay and Matthews (1972) 
suggests that there may be detectors sensitive to f.requency modulations, at 
l^ast within a certain range. More, and perhaps more indirect, ervidence comes , 
from stVidies on the so-called adaptation-shift phenomenon, first found in speech 
by Eimas $nd Corbit (1973) and since studied by a number of ^investigators. [For 
a review, see "Cooper (1975) and Darwin (in press).]* , Among those studies is a • 
recent one by Ganong (197S) that I will. describe, if only briefly, because its 
outcome has several implications for our concexm with specialj-ized processes: it 
suggests, as do several other such studies, that transition detectors may exist, 
but it al$o indicates that' such detectors are in no way disal?led by the frica- 
tive noise of our example. , - 

Ganong' s experiment went like thia« Having first found the boundary between 
synthetic [da] and [ba]^ Ganong adapted his subjects with [da] and measured the 
resulting shift in the [da-ba] boundary. Then he put a patch of fricative nodse 
in front of the [da] and adapted his subjects with ,the [sa] syllable that .they ' 
all heard when the fricative-patch-plus [da] was sounded. The effect on the ' 
[da-ba] boundary was at least as great as when the adaptation was carried out 
with [da]. As a Control against the possibility that [sa] had' ilts effect; because 
it worked on the^same abstract phonetic-feature detectcJ3r*SCB"^a1 ([s] and [d] 
h^ve the_ same place-of-production featifre), Ganong adapted with a [sa] from , . 

_wfiich the formant transitions h^d been removed^ in* that condition the effect on 
the [da-ba] boundary was'^much smaller. Those results suggest that the adapta- 
tion shift in the [da-ba] boundary was caused by a change in t^e stajie of some 
device ^that responds to formant transitions; thus, they support' the assumption 

•that there are such things as transition detectors. 



30 



But Ganong's results also show, more generally, that the transition cues 
following the fricative noise were getting through in full strength, at least 
as auditory events. If those transition cues nevertheless failed to produce 
perception of a stop consonant, it was not because they were absent from the ^ 
auditory display. [Other kinds of evidence for the same conclusion are reviewed 
in Liberman (in press).] 

We are led, then, to a thitd explanation for the disappearance of the stop 
consonant: silence is necessary for theperception of stop consonants, not ^be- 
cause, it provides time to evade normal auditory forward masking, and nof^because 
it; prevents the disabling of specialized transition detectors, but because it 
provides information. The information is that the speaker did indeed make the 
total closure of the vocal 'tract necessary to the production of a stop consonant. 
Thus, given enough silence^ to indicate a sufficient closure of the vocal tract, 
a specialized phonetic device could interpret the transition cues as reflecting 
a linguistic event that included* the stop-consonant segment [d].' Hence the per- 
ception [sta] when' a silent interval of aboutf 50/ms^c is placed between the end 
of the hiss and the'beginning of the transitions. .Without that siletTt interval 
the only reasonable phonetic interpretation is that the vocal tract did not 
close completely. Hence [sa]. 

So much, then,, ^or the possibility , that there are at least two different 
kinds of devices specialized for speech. Let me ngw comment on the papers Sf ^ 
^the evening with reference to that distinction. » , * 

In the presentation by Dr. Andrews we saw interesting evidence that baboons 
change the configuration of their vocal tracts so as to produce something like 
formant transitions and, further, that such transitions may convey information 
from one baboon to another. If it Is indeed the formant transitions that carry 
the information, and -if the transitions are as brief and rapicj as they sometimes 
are in, human speech, then we should not be surprised to find feature detectors* 
specialized to tract them. And in working with baboons we might, of course, 
expect to get at such devices more directly than we can in research on human 
beings. * . . * 

Though baboons may produce and respond to rapid transitions, we have as yet 
- found no reason to believe that they (or, indeed, any creatures other than man) 
produce or pefceive phonetic strings. I should doubt, therefore, that we would 
find the specialized phonetic processor, to which I referred. But what I doubt 
is surely not important. What is important, I should think, is that we can find 
out whether baboons do have^ something like transition detectors and also whether 
they behave toward speech as if they make a phonetic interpretatfion. Dr. Ajfidrews 
has given us a good start in that ditection. 

* 

Jhe (experiments* t;hat .IJhilip Morse described are avinodel of how to learn 
about the biology of language. To. select some interesting cha'racteria^c ^ 
human speech , perception and then look for that characteristic in preli^uistic 
infants and nonhuman primates is surely one of the best ways to uncovei;'. whatever 
there may be of biological t>redisposition, specialized procejss, and speV:ies' 
specificity. The experiments are certainly hard to do, but they are very much 
wbrth doing, and Dr. Morse does them very well indeed. , , . ' 

The results Dr. Morse told us about this evening were interpreted by him in 
terms of the possibility that there are devices like. transition detectors. In 

- - ; . . 31 

36 ■ 



his view, such devices tnight explain categorical perception of the place dis- 
tinction for stop consonants in infants and the somewhat in-between tendency 
toward categorical perception he got in monkeys. I think, it quite reasonable to 
suppose that the output of such detectors would be categorical. I doubt, how- 
ever, that the concept of feature detector could take us very fax; toward ex- 
plaining the perception of stop consonants, except by a kind of metaphorical ex- 
tension. Some of the reasons for ray doubt will, perhaps, become clearer in con- 
nection with the examples I mean to develop when I discuss Dr. Warren's paper in 
a few moments, so I will say no more abotit tl\ose reasons now* ^ In fai^pess to 
Dr. Morse, however,^! should emphasize that he was not trying to explain* tlie 
perception of stop consonantB, nor even the perception of the place feature,' but 
only some data on discrimination and tendencies toward categorical perception in 
infants and monkeys. 

^ - 

^s for Dr. Morse's experiment, I should say that tn using three fotqiants 
instead of two he gained the advantage of greater realism but at the cost of 
some added difficulty in interpretation of the results. That difficulty arises 
because when second- and third-f ormant transitions are both varied, it is harder 
•to scale physical similarity and therefore that much harder to assess tendencies 
toward categorical perception. If one nevertheless prefers to use the three- 
f ormant patterns because they are closer to what occurs in speech, he might re- 
duce the difficulty I referred to by coupling the transition cues with .a variety 
of vowels, thus randomizing the acoustical similarities; if the discrimination 
functions nevertheless come out the same way they did in Dr. Morse's experiment, 
the conclusion would be quite compelling.* . ^ 

Still, the results so far obtained with infants are impressive. The in- 
fants of Morse's study .did. show a strong tendency toward categorical pe?rception 
of the place distinction in the stops, and, as Morse pointed out, that result 
accords with those obtained* by other investigators. In the case of the monkeyp, 
however, it is a good deal less clear that perception of the stops is categor- 
ical. There was, in the monkeys of Dr. Morse's experiment, some tendency in 
that direction, though less apparently than with the infants. In that connec- 
tion, we should keep in mind the results of, the earlier study by Sinnott (1974), 
to ^which both Mprse and Warren deferred. Using reaction time as the measure, 
Sinnott found that her monkeys, like tliose of Morse, discriminated within pho- * 
netic categories; but they did not discriminate better across phonetic bound- 
aries than within them. That is, Sinnott 's monkeys did not show any appreciable 
I tendency toward categorical perception, though her human subjects did. 

Since tl\e experiment on discrimination of the voicing distinction by 
chinchillas (Kuhl and ^Miller, 1975) was several times referred to by our speak- 
ers, I should also comment on that. It is surely of interest that the chin- 
phillas "classified" tlie^^eech stimuli so as to put the boundary in much the 
same place that human listeners do. ^ Given that the relevant acoustic due is the 
relat|.ve time of onset of two parts of the pattern, it is also of ii^t^rest that 
research with nonspeech sounds has found ^categorical "notch" in tUe^ auditory 
system at a relative displacement appropriate' to the speech-sound bcfundary 
(Millar, Pastore^ Wier, Kelly, and Dooling, 1974). In the case of ihe voicing 
distitiction, it may be, therefore, that in the development of language, nature 
took advantage, of a categorical distinction characteristic of some mammalian 
auditory systems, though special adjustments in the articulatory mechanisms 
would presumably have^ been necessary to get them to produce accurately just that 



3^ 



32 



* 



small difference in timing required to put the sounds within the preset (and 
rather narrow)*' constSkints' of the ear/ 

I nevertheless have several reservations, even about this apparently simple*, 
case. Using an expanded range of the same stimuli that were used ifi the chin- 
chilla experiment, Wilson and Waters (1975) found that variations in stimulus 
range caused rhesus macaque monkeys to shift their "boundary" from 28'msec^ 
/ whic^h happens to be about where the chinchilla boundary was, to 66 msec. (They 
also found some tendency toward categorical perception^ wherever the boundary 
was.) That kind of change, which implies that the monkeys may have been split^ 
ting the range, does not occur in human, subjects. TSee, for example, Sawusch, 
Pisbni, and Cutting (1974).] The possibility that such a change might occur in ' 
chinchillas was not controlled for. 

•I 

My other reservation arises from the fact that the human boundary is not 
fixed at either of the boundaries so far found with animals and with nonspeech 
sounds, but rather varies (together with the categorical notch) from 18 msec to 
as much as 45 msec as a function of the duration of the transitions and* th^ fre- 
quency at which the first formant begins (Stevens and Klatt, 1974; Lisker, 
Liberman, Erickson, and Dechovitz, 1975). (The variation with duration of the 
transitions may reflect a normalization for rate of articulation.) I would be 
interested to know if the chinchilla's boundary moves in the samfe way. It would 
also be interesting to know if the. chinchilla, or any other animal, appreciates 
th^t the voicing distinction is, indeed, the same in those cases in which^ the 
relevant acoustic cues are entirely different. What happens, for example, when 
the distinction is moved from initial position (e.g., [bi] vs. [pi], which is 
the kind of distinction so far studied in animals) 'to intervocalic position 
following a stressed syllable (e.g., [raebid] vs. [raepid]), where a sufficient 
cue is the time interval between the two syllables;' or to final position [e.g., 
traeb] vs. [raep], where a sufficient cue is the duration of the preceding, 
vowel (plus consonant-vowel transition)]? To "understand" that such distinctions 
have something in common despite gross difference in the acoustif cues would 
constitute an impressiva demonstration of phonetic interpretation. 

We come now to that part of this evening's program that touched more 
diifectly on the matter of specialised phonetic processes. The relevant paper 
w.as given by Richard Warren. He reminded us o^ his earlier experiments — very 
important experiments, in my view — in which he found that the auditory system 
does not measure up to one of the requirements of phonetic perception. The re- 
quirement is that the order of the phonetic segments be preserved; the' word "bad" 
is different from the word "dab." Now if measure the rates at which speech 
i§ produced and perceived, we find that the durlations we can allot to the pho- 
netic segments are often very short. Indeed, those durations can be as little 
as 50 msec per segment or,, for b.rief periods, eveJi less. ,^But Dr. Warren has 
found with nonspeech soqtfds that the ear cannot^ properly cppe with segments of ^ ; ' 
thosq .temporal dlmensiot|^. ; At the.>ery short qy^ations that we can assign to 
phonetic segments, the ea^ican discriminate ond order of segments from another — 
that is, it can hear distinctively different patterns—but, as Dr. Warren told 
us, it is unable to identify the separate comp9nents in the order of their 
occurrence. Now I will not here review or' comment on Dr. Warren's solution to 
this vQry real problem. I will rather offer an alternative, which is' that in *^ 
perceiving the order of the phonetic segments we need not—rand indeed do not — 
rely on Che temporal order of acoustic segments:. Indeed, I would argue that 
oven if the ear were able to identify the order of very short-duration acoustic 

' . ■ 33 

/ 

ERIC /^B , 



segments, it coul3 hardly make use of that ability in perceiving speech. That 
would be so because the string of phonetic segments is drastically restructured 
in the conversion to soxmd, with the result that segmentation of the sound does 
not cotrespond directly to the segmentation of the message; accordingly, the 
segments are not signaled simply by acoustic events in ordered sequence. But., 
fortunately for the integrity of the mes,sage, inf onnation^.about segment order, is 
nevertheless conveyed, though by acoustic cues that could be Interpreted, I 
should think, only by a device that "knows" the .secret of. the code — that is, by 
a phonetic device. - ^ 

Let us consider, for example, the matter of segment order in the syllables 
[ba] and [ab] and see how information about the phonetic structure is carrie^J in 
the sound. In producing those syllables, the gestures. for the segments [b] and 
[a] are not made discretely and in turn. Rather, as we well know, the gestures' 

^ are organized into units larger than a segment — something like a syllable, per- 
haps — and then coarticulated. If the [ba] and [ab] syllables had been produced 
at a moderately high rate of •articulation, we should then see for [ba] an 
acpustic signal lasting perhaps 70 or 80 msec and containing three formants that 
rise from the beginning of the aCoustic syllable to the end. Fof [ab] we should 
see the mirror image — ^that is, three formants that fall. If we search put the 
information about [b], we find that it exists not just at the beginning (for ' » 
[ba]) or at the end (for [ab]), but throughout the acoustic syllable. Ijiforma- 
tion about the vowel is also carried from one end of the sound to the other. It 
is as if the coarticulation has effectively folded consonant and vowel into the^ 
same piece of sound. As a result, there is rib acoustic ' criterion by which one 
can divide the speech signal into segments corresponding to the segments of the 
phonetic message. A further consequence is that the cues for the segments must 
necessarily exhibit a great deal of context-condit^pned variation: the transi- 
tion cues for the consonant, for example, are riatng in the one case and falling 
in the other. (It should be remarked that when ,we listen to those transitions 
in isolation we hear rising and falling glissaifi^os, just as' our knowledge of 

^ auditory psychophysics would lead us to expect.). ' V 

To explain how a listener might recover, the identity of the segments — that 
is, know that there is a consonant [b] and ^ ^owel [a]--we might suppose that 
there is a specialized phonetic device .that can "hear through" the context-con- 
ditioned variation in the acoustic cues and arrive at "the canonical form^ of/* 
the segments. If so, then that same device could use the same context-cofvdi- 
tioned variation to discover the order of the segments: for it the rising ^^atr 
tern contains a [b], then it could only be a syllable-initial [b]; and if th^ / 
falling pattern contains a [b], it could only be a syllable-final [b]'. .Thus, I 
would suppose that perceiving the order of the phonetic segments does not depend 
on the ability of the ear to deal with discrete sounds of short duration, but 
rather on the operation^ of a^,,special phonetic device that is able to copie with 
the fact that information about, order is^, often encoded in the sound as varia- 
tions in acoustic . shape. Indeed, I wou^a suppose that -such encoding would seem 
nicely designed to evade just those limitations of the ear that Dr. Warren|s 
research has revealed. 

I should comment finally on the pajj^^r by Philip Licberman. His work is 
especially interestin;g from my point of view because it offers evidence fox a 
specialization associated with the production of speech that is, in an important 
'sense, analogous to the transition detectors d£ the auditory system.' To see the 
analogy, we should consider what might have occurred as grammar — hence language- 



1 



evolved. The view I want to preseut has been developrf8> elsewhere • (Liberman, 
1974) V 80 I will only outline it' here. 

If, as in an agrammatic system of acoustic communication, the messages were 
directly linked to sounds, the number of messages we could contmunicate would be 
limited to the number of holistically different sounds we can produce anii per- 
ceive* And that is a relatively small number. But grammar drastically restruc- 
tures the inforTnatior\ in the message, making it appropriate, at the one end, for 
the great message-generating capabilities of the brain and, at the other, for 
*the relatively limited abilities of the vocal tract and the ear to produce and 
perceive .sounds. Viewed this way, the processes underlying grammar evolved as 
a kind of interface between two different kinds of structures, adapting the ^ 
potentialities of the one to the limitations of the other. (My earlier, comments 
on evading the auditory limitations described by Dr. Warren are an example of 
this kind of grammatical 'function at the very lowest, level ojE the linguistic 
system — that is, at the conversion from phonetic message to sound.) But it Is 
also possible that in this evolutipnary process the structures being* linked by ^ 
the grammar might themselves .have changed. On the perception side of the pro- 
cess an example would be the development of transition detectors in the auditory 
system to extract just^that information which the phonetic (grammatical) system 
used in carrying out its peculiar function. And on the production side there 
are the changes in the vocal tract tKat Dr. Lieberman has told us about. Those 
changes have apparently made the vocal tract, less limited for phonetic communica- 
tion, and so have' reduced the mismatch between that organ and the message- 
generating intellect, a mismatch otherwise taken care of by the grammar. We 
might suppose that if we had to iSpeak with the voqial tract of a nonhuman primate, 

the grammatical interface would have to be even more complex than it is. 

• « 

I think I can justifiably end my comments on a hopeful note. Jhose of us 
who care about speech and the biology of langu^g^ have'feason to, be encouiwged. 
We now know enough about speech to be able to identify some of its most diistinc- 
tive characteristics — those eharacteristics, that is, that most dlearly imjily 
the existence of specialized linguistic processes. As_a result, we^ can fruit- 
fully make comparisons with nonlinguistic processes in man and with any processes 
at all in prelinguistic infants and (presumably) nonlinguisftic animals. Indeed, 
the comparisons are, for obvious reasons, eaS^ier to m&ke at the level of speech 
than at the level of syntax, especially with inf dnts _ and animals. Moreover., we 

. have started to make those comparisons.* But we have oAly ju^t started. There 
are hundreds of experiments out there waiting to be done. 4Jntil we see what 

^ 'results they produce, we would be well advised, I think, to suspend judgment. . 

, ^" . ^ REFERENCES . 

Cooper, F. S. (1972) How is language conveyed by speech? In Language by ,Ear 
and by Eye , ed. by J. F. Kavanagh and I. G. Mattingly. (Cambridge,^ Mass. : 
MIT Press), pp. '25-45. 

Cooper, W. E. (1975) Selective adaptation to speech. In Cognitive Theory , 
. vol. 1, ed. by F. Restle, R. M. Shiffrin, N. J. Castellan, H. R. Lindman, 
and D. B. Plsoni. ' (Hillsdale, N. J.: Lawrence Erlbaum. Assoct) , pp. 23-54. 

Darwin, C. J. (in, press) The perception of speech. In Handbook of Perception , 
vol. 7, ed. by E. C. Carterette and M. P. Friedman. (New York: Academic 
Press). [Also in Haskiiis Laboratories Status Report ^n Speech Research 
SR-42/43 (1975), 59-102.] ^ - 



35 



ERIC 



Eimas., P. D. and J. D. Corbit. (1973) ' Selective adaptation of linguistic fea- 
ture detectors. Cog. Psychol. 4. 99-109. 

Elliott, L. L. (197J.) Baclward and fofrward masking. ' Audiology 10 , 65-76. 

Fant, C. G. (1962) Desariptive.»analysis of the acoustic aspects of speech. 
Logos 5.,- 3-17. . s ^ , , 

Ganong, W. F. (1975) An ^experiment on "phon^ic* adaptation." Quarterly 

Progress Report (Research Laboratory 5f Electronics^ MIT) 116, 206-210. . 

Kay, R. H. and D. R. Matthews. (U222) , On thfe existence in hurnarT^auditory path- 
ways of channels selectively tuned to the modulation present in frequency- 
modulated tones. J. Physiol. (London) 22S , 657-677.^ ^ , 

Kvihl, P. K. and J. D. Miller. (1975) Speech perception by the chinchilla: 

Volced-voicel^ss distinction in alveolar plosiv6 consonants. Science 190, 
69-^2. . • • 

Leshowitz, B. and E. Cudahy. (1973) Frequency discrimination in- the presence of 
another tone. J; Acbust> Sop, Am. 54 , 882-887. . J'i 

Liberman, A. M. (1974) The specialization of the language hemisphere. In The 
Neuroscien ces; <rhird Study Program ^ ed. by F. .0. Schmi£t and F. G. Worden. 
(Cambridge, Mass.^: MIT Press), pp. 43-56. [Also, in Raskins Laboratories' 
Status Report on Speech Research S^- 31/32 (1972) ,r 1-22.] - ' 

LibermaiT*^* M. (in press) How ab3tract must a. motor theory of speech be? 

PapW/deliver|||d .at the 8th International Cgngress of Phonetic Sciences, [ 
Leetfs, 21* AugtJfet 1975. [Also in Haskins/Laboratories Status Report on 
Speech .Research SR-44 (W5) i l-lsT] ] ~- ^ 

Lisker'; L.^ A. M. Liberman, D.. Dechovitz,, and D. Erickson. (1975) On pushing 

the voice-gnse't-time boundary about. J. Acroust. Soc. Am. > Sup.^1. 57,. S50(A)'. 
[Also in Haskins Laboratories Status Report on Spee ch Research SR-42/43 
(1975), 257-264.J» ' 

Miller, J^. 6.,' R. E. Pastore, C; C. Wier,-W..^. Kelly, and R.' J. Dooling. (1974) 
^ Discrimination and labeling of noise-buzz sequences with various noise-lead 
times. J. Acoust. Soc. Am. ,* Suppl. 55, S390(A). o * ' 

Sawus'ch, J. R. , D. B. Pisonl, and J. E. Cutting. (19/4) Category boundaries 
for linguistic and nonlinguist:tc. dimensions ofy^e same stimuli. Research 
pn Speech Perception (Departmen£ of Psychology, Indiana University) 1, 
a62-173. ^ - . . ^ • ' ^ . 

Sinnott, J.-M, (\97^) ^Human versus monkey discrimination pf the ./ba/ /da/ con- 
. tinuum using three-step paired comi^arisons. J. Acoustl Soc. Am., Suppl. 
'55, S55(A); - ' ; ~ — ' 

Stevens, ,K. N. and A. S. House. (1972) The perception of speech. In Founda- 
tions of Modern Auditory Theory , -vol. 2,- ed. by J. Tobias. (New Yorlc: 
Academic; P.ress),*pp, 3-62: " - ♦ 

Stevens, K. N. and D. H. Klatt.' (1974) Role of forraant transitions in the 
^ voiced-voicelesd distinction for stops. J. Acoust". Soc. Am. . 55 , 653-659. 

Studdert-Kennedy, M. (1974) The perception of speech. * In Current trends in ' 
Linguistics, ed. ^by T. A. SebeSk. (The Hague: Mouton)"] IMftQ*<h»> jisldns 
Laboratories 'Status Report og Spigech Research SR-23. (1970), ,15-48. ]^^'^ 
-Wilson, W. A. and R. S. Waters. (1975) How monkeys perceive some sounds of 
'human speech* Paper read at a meeting of the American Psychological 
Association, Chicago, September. .* . * 



Consonant Environifient S^ecif^^es^5?«wel Identity* 

Winifred Strang^, Robert R.' Verbrugge, Donald P. Shankweiler , and 



Thomas R. Edman 




ABSTRACT 



Past studies have shown that while vowels' can be produced with 
static vocal-tract' configurations, the resulting , steady-state tokens 
are misidentif ied frequently by naive listeners. The fitst experi- 
ment compared the perception of isolated vowels with vowels spoken in 
a fixed consonantal frame by the same set of 15 talkers. Vowels in 
/p-p/ syllables were identifi^ with far greater accuracy than were 
comparable isolated; Vowels -in both single and multiple ' talker ^ condi- 
tions. Acoustical analyses of the. test tokens showed that the poor 
intelligibility of isolated vowels could not be attributed to talkers' 
failure to produce these vowels' correctly. In a second experiment, 
vowels in Syllables in which^ the. initial and final stop consonant 
'Varied unpredictably from item to* item were still identified with 
greater accuracy than were isolated vowels. 'These results offer 
strong evidence that dynamic acoustic information distributed over 
the temporal course of" -the syllable is used regularly by th^e listener 
to identify vowels. * " 



*A partial suiAnary of these results was presented at .the 87th meeting of the 
' ^Acoustical Society^ of ^erica. New York,^ 25 April 1974, and published in 
^Strange, Verhru-gge, and Shankweiler (1974). A more complete exposition af* 
^ the problem of perceptual constancy-^ln speech perception may be found in 
^ Shankweiler, Strange, and Verbrugge (in press). ^ * 

University of Minnesota, Minneapolis. 

t I 

Also 'Unive^rsity of Connecticut-, Storr^.- 

** ^ , 

Acknowledgment ; this paper reports research begun du'td^ng the academic year 
1972-1973 while D. Shankweiler was a guest investigator at the Center for 
Research in Human Learning, University of Minnesota, Jlinneapolis. The work 
was supported by grants to the Center and to Raskins Laboratories from the 
National Institute of Child Health and Human" Development, by grants awarded 
to D. Shankweiler and J. J. Jenkins by the National Institute of Mental 
Health, and by a fellowship to R. Verbrugge from the University of Michigan 
Society of Fellows. We wish to thank Kevin, Jones, Kathleen Briggs, and Robert 
Jenkins for their assistance in ^he experimental work,, and James Jenkins for 
. his advice and encouragement throughout tftis research^ 



[HASKINS LABoIrATORXES: , Status Report on Speech Research SR-45/46 (1976)] 



37 



4 2 



INTRODUCTION 



Vowels, unlike consonants, can produced and identified in isolation. 
This possibility was exploited early in 'the investigation of vowel quality, as 
witnessed by studies of the cardinal vowels (Jones, 1956). Sustained, "steady- 
state" Vowels can be classified by freqCiencies of the first two or three for- 
mants (Potter and Steinberg, 1950). So successful were the efforts to locate 
the acoustic informatioii sufficient for the perception of sustained vowels that 
the main focus of research on speech perception shifted to the search for the 
consonantal cues. But the supposition that the sound pattern is simpler in the 
case of tlje vowels than the consonants is unsupportable if a distinction is made 
between, the sustained, isolated vowel and the vowel as it occurs in natural 
speech. 

Although they can be produced in a quasi-steady-state manner and in isola- 
tion, vowels so pVoduced must be regarded as laboratory artifacts. Ordinarily, 
vowels occur in coartitaulation with consonant^ , in thp context of the syllable. 
The acoustic information in coarticulated vowels is fused and carried in paral- 
lel with the consonantal information. (See LibeiAnan, Cooper, ghankweil^r, and 
Studdert-Kennedy, 1967; Liberman, 1970.) It was discovered long ago in tape- 
cutting experiments of Schatz •(1954)^nd ^Harris (1953) that vowel . quality^* cannot 
be discretely localized in any sin,gle)portion of the syllable, but is distrib- 
uted throughout the period during wMch' voicing is present. 

Studies of perturbations of formant frequencies brought about by uttering 
vowels in the. context of syllables were carried out by Shearme and Holmes 
(1962), Lindblom (1963), SteVens and House (1963), and Ohman (1966). These in- 
vestigations demonstrated that steady-state values of ;the formants .are rarely 
attained because articulatory movement is more or less continuous. Thus, the 
acoustic description of vowels in ordinary speech is a* good deal more complex 
and problematic than is revealed by the classic studies of the acoustic basis of 
vowel quality. 

If the acoustic structure of the isolated vowel often differs greatly from 
the "same vowel" in context, it might be inferred that "different cues are em- 
ployed in vowel perception when the vowel is in consonantal context and when it 
occurs in isolatioi;i. It is all the more interesting, therefore, to find indica- 
tions in the phonetic literature that isolated vowels are difficult, to perceive. 
For example, Fairbanks and Grubb (1961) presented nine isolated vowels produced 
by phonetically trained talkers to experienced listeners. The overall' identif It 
•cation rate was only 74 percent, which contrasts strikingly with a rate of 94 
percent obtained 1>/ Peterson and Barney (1952) for perception of vowels in /h-d/ 
context. Somewhat better identification of isolatedJj vowels was obfained by 
Lehiste and Meltzef (1973), with only three talkers producing the tokens. 
Fujimura and Ochiai (1963) directly compared the identifiabmty of vowels in 
consonantal context and in isolation. They found that the center portions of 
vowels, which had been gated out of consonant-vowel-consonant (CVC) syllables, 
were less intelligible in isolation than in syllabic context. These' findings 
suggest that isolated vowels are misid^ntif led with significantly higher fre- 
quency than vwels spoken in at least som6 consonantal environments. Could' it 
• be that the acoustic complexities introduced by syllabic structure better serve 
the requirements of the perceptual apparatus than do quasi-steady-s'tate .formants? 
If so, then it is surely inappropriate to characterize the cues for vowel identi- 
ty in terms of static points in A space ^defined by the first" two formants. 



It seemed important, therefore, ' to •atten5>t to demonstrate under carefully 
controlled experimental conditions that vowels in consonantal contexts are per- 
ceived with ^fewer prrors than "the same vowels" presented in isolation. A 
further purpose of the research reported here was to investigate the sources of 
information within the CVC syllable that specify the vowfil and to explore how 
that information is used by the perceiver in the process of peroeption.L • If it 
is true that^^onponantal environment generally aids ±xi identification of a 
vowel, ^e reciignize that there is more than one way the environment might play 
a facilitatihg\ole. One. possibility is that portions of the signal commonly 
regarded as consonantal, such transitions, might aid in normalization, for 
vocal-tract ^ffdrences. 'Experiments by Fourcin (1968) and Rand (1971) have 
found that per&^ptual boundariesjjetween stop consonants vary depending on the 
^ vocal tract presumed to have pr^uce4 a syllable. The phonemic identity of the 
consonants was fixed and known in advance in the Peterson and Barney (1952) 
study and in our own investigations (Verbrugge, Strange, Shankweiler, and Edman, 
in pr^ss). In theise cases, the transitions may have allowed listeners to scale 
the 'formant frequencies of the medial vowel according to the vocal-tract charac- 
teristics of >the talker and thus reduce vowel ambiguity. 

On' the other hand, isolated vowels may be difficult to perceive fdr a more 
fundamental reason. It is possible that listeners' ordinarily rely upon informa- 
tion distributed throughout the whole syllable for identification of the. vowel. 
This seems likely in yiew of parallel trmismission of the consonants and the 
yowel. If it is the case that syllable- initial and syllable-final transitions 
specify the vowel as well as the consonants, we could assert that the vowejT is 
inseparable from the syllable, that it is not specified b% formant frequencies 
at any particular .cross section in time, but rafther is carried in the dynamic 
configuration of the whole syllable. In this case th^ presence of transitions 
should aid identification of the intended vowel whatever additional' difficulties 
may* be posed by confronting the listener with multiple vocal tracts. 

EXPERIMENT I: PERCEPTION OF ISOLATED AND MEDIAL VOWELS ^ 

If consonantal environment aids in specifying vowel* identity in either of 
the two ways .postulated above, we would expect that the perception of isolated 
vowels would be less accurate than the perception of medial vowel^ in listening 
tests, where the tokens on a test were produced by different talkers.' Previous 
studies on the identification of steady-state vowel stimuli ^support this hypoth- 
esis (Fairbanks and Grubb, 1961; Lehiste and Meltzer, 1973). However, these in-- 
vestigations do not directly compare isolated vowels with vowels in syllable 
frames, when the number and t^rpe of talkers, number of response alternatives, and 
other factors are held constant. Millar and Ainsworth (1972) report that lis- 
teners were able to identify synthetjlcally generated vowels more reliably and , 
uniformly when the vowels were/ embedded in /h-d/ words than when the ac^stically 
identical segments we^re presented in isolation. , We are not aware of any studies 
that, directly compare, the perception of naturally produced isolated vowels with 
vowels in context. " . ' 

The present study compares the identif lability of vowels produced in a 
fixed consonantal frame with isolated^ vowels when (1) a single talkei^ produced 
all tokens on a particular listening test (Segregated Talker condition) and 
(?) when tokens produced by several different talkers are presented in random 
order (Mixe4 Talker condition), fy independently varying these two factors 



•39 



(consonantal context and talker variation), we can assess the' relative contrib- 
ution of each to the accuracy of vowel identification. Further,, the design 
allows, us to test the two hypotheses regarding the way in which consonantal in- 
formation may be utilized. If consonantal environment .ai4s in vowel identifi- 
cation by serving as a calibration signal for vocal-vtract normalization, we ex- 
pect an interaction between the two major variables. That is, we expect that 
the loss in identif iability of vowels due to the absence of consonantal transi- 
tions will be more severe in those tests* where/ talker identity changes,, since 
recalibration is necessary on each trial. Wev^xpect no aigniflcant disadvantage 
of tt^Eabsenc^ of con^natit^l transitions for thos^ tests in which talker iden- 
djTt^ is' unchanged. Alternatively, if consonantal transitions provide informa- 
tion that specifies vowel identity independent of talker normalization, we ex- 
pect no such Interaction. The identification of isolated vowels should be less 
accurate than of vowels in consonantal context both for tests on which the ^ 
talker remains constant and for tests on which talkers are mixed. - ^ 

This study compares listeners' perfoirmance on isolated vo\^el te'Jsts with 
the results reported previously for medial vowels spoken in /p-p/ environment 
{Verbrugge et al. , in press). The tests were directly compairable 'on all fac- 
toi;s, such as identity of talkers, order of presentation of alternatives, re- 
sponse alternatives, and recording and reproduction conditions. 

Method 

, « * 

Stimulus materials . ^The panel of talkers described in our previous re- 
search was also used for this study. Five men, five women, filnd* live /children, 
none of whom were trained speakers, were selected to represent a wide variety of 
vocal-tract: sizes and characteristic fundamental frequencies. According to the 
judgment of the experimenters^ .the talkers represented a fairly homogeneous 
dialect group, that of the -upper midwest region of the United States from which 
the li$teners were also drawn. • 

The materials for the /p-p/ tests (Mixed* and Segi;/egated Talker) we^e those 
described in Vefbrugge et al. (in press: Exp. II). Talkers read the test sylla- 
bles, which were prtnted-individually on cards. Thl^/p-p/ words were also used 
to represent the isolated vowels; ta,l,kers were inst;ructed to pronounce the vow- 
els as they would be pronounced in these key words. They were given one prac-, 
tice trial and were instructed to produce the tokens quite r.apidly.' E^ch talker 
produced one token of each of nine isolated vowels: /i/, /i/, /e/, /ae/, /a/, 
/o/, /a/, /u/, /u/. . . ^ - ' ' ' 

* For tt^e Mixed Talker Isolated Vowel test (Mixed //-//) three of the nine • 
vowels were selected for each talker, corresponding to the three vowels he pro- 
duced fot the /p-p/ tj^st. As in the earlier test, vowels. were assigned to ^ 
talkers randomly with* the constraint that each talker contributed*'only one of 
the point vowels. Thus, the Mixed //-// test consisted of five tokens of *each of 
nine vowels; each of the five tokens was spoken by *a different talker. 

The Segregated Talker Isolated Vowel tests (Segregated //-//) were comparable 
to the Segregated Talker /p-p/ tests described in Verbrugge et al. (in press: 
Exp. II). One man^^ one woman, and one child each produced^a 45-item test that 
contained five different tokens of each of the nine vowels. 



Ail tesifijjitimuli were recorded in a souAd-attenuated experimental room with 
a ReVox A77 steteo tape recorder and Spher-o-dyne microphone. The 45 tokens on 
a test were arranged io a random presentation order with the restrictions that 
• the same intended vowel did not appear more than twice consecutively, and 
tokens produced by the same talker were separated by not less than eight tokens 
(in the Mixed ,tests). Identical procedures were used to construct eacH^of the 
four tests so that presentation order, timing, and peak intensity of test tokens 
were idBntrical for all tests.** . ^ ' 

Procedure , Listening tests were presented to small groups of ^subjfects in 
a quiet experimental room via a Crown CX 822 tape recorder, Macintosh MC40 
amplifier, and AR acoustic suspension loudspeaker. Listeners responded on 
score sheets that contained nii^ response alternatives, written out in full in 
eacb row: "pip, pup,^pap, peep, pop, pe^^ poop, pawp, puup." Before the tests, 
the experimenter pronounced each of the nine key words, drawing special atten- 
tion Lo the last word, "puup," which stood for the syllable /pup/, for the //-// 
tests, the experimenter ptonounced^each k^ word followed by the vowel in iso- 
• lation, again with special attention to the /u/ alternative. Subject^in the 
Mixed Talker conditions were told they would hear "several dif ferment talkers"; 
subjects in the Segretated Talker conditions knew thay would hear only one 
voice on each 45-token test. 

^ Independent groups of subjects responded to the /p-p/.and the //-// Mixed 

talker tests. Each group of subjects completed two repetitions of the 45-token 
test for a total of 90 judgments per subject, 10 qn each intended vowel. In 
the Segregated Talker cpnditions, three groups of subjects heard the /p-p/ tests 
and another three groups heard the<>//-// tests. The order of presentation of the 
Man (M) , Wpman (W) , and Child (OJ tests was counterbalanced across the groups 
in the orders: ^IWC, WCM, CMW. Data for only the first two tests were analyzed- 

^ (i.e., MW, WC, and CM, respectively). Jhus, the total number of judgments by ' 
the Segregated test subjects was equivalent to that for the Mixed test subjects 
(90 judgments) and any effects of fatigue or familiarity were equally' distrib- 
uted across the three talkers for the Segregated tests. 

Subjects . The data presented here for- the /p-p/ conditions are those ob~ ^ 
twined in the previous study (Verbrugge et al. in preds: Exp. II). Thirtyr- 
^ three subjects served in the S^gr^gated /p-p/ tests (11 in each condition}^ and 
"19^subjects were tested on the Mixed^/p-p/ test. For the tests on isolated 
- vowels, 30 subjects were tested in the Segregated //-//-test (10. per condition) 
and 16 subjects heard the Mixed //-// test. All subjects were paid volunteers 
from undergraduate psychology classes at the University of Minnesota. All were 
native speakers of English afid most' were natives of the^upper midwest region. 

Results , 



ERIC 



Errors in vowel identification were tabulated for eadh condition; an erroi; 
was defined ^s the selection of a response other than that intended by the • 
talker. The overall error rate for the four experimental conditions is shown 
In Figure 1. The main comparison of interest is between performance reported 
earlier for vowels in /p-p/ environment and performance on the isolated vowels. 
On the average, there were 17.0 percent errors on the Mixed /p-p/ test and 9.5 
percent errors on the Segregated /p-p/ test. For the isolated vowels, on the 
other hand, there were 42.^6 percent errors on the Mixed test and 31.2 pet cent 

41 



1 



errors on the Segregated test. ' Errors .summed over all nine vpwels for each sub- 
ject were submitted to a 2 x 2 analysis of variance for unequal cell frequencies. 
The main effetts for talker variation (Mixed' vs. Segregated) and* consonantal 
context (/p-p/ vs. y/-//) were both significant [F(l,94) » 21.18 and 125.17, 
respectively, £ < .01]. However, no sigHlficant interaction between the two 
variables was found [F(li94). - 0.$3]. 



50- 



40- 



c/) 
O 

^ 30 



0) 



10- 



Medial 
Vowels 
P-P 



Mixed Talkers ' 
I I Segregated Talkers 



Isolated 
Vowels 
#-# 



Figure 1: 



Overal-l percent errors for vowels in /,p-p/ syllables and isolated 
vowels. Open bars show errors for Segregated Talker conditions; 
shaded bars show errors for Mixed Talker conditions. ^ ' 



These results indicate that while talker variation does contribute signifi- 
cantly to Vowel identification err'ors for both medial vowels and isolated vow- 
els, the presence or absence of consonantal context is by far the more important 
variable. Listeners misidentif ied approximately three tlines as many isolated 
vowel tokens as they did the corresponding medial vowels. Thus, it appears that 
the presence of a consonantal environment is much more critical for accurate 
vowel identification than is familiarity with the characteristics of the talk- 
ers' vocal tracts. * 

The hypothesis that consonantal environment contributes to perception of 
the vowel by providing eyes for talker nonnalization was not supported. There 
was no interaction between the two major variables; the increased error rate due 
to the absence of consonantal context was almost as great when the talker was 
constant (an increase of 22 percent") as it was when talkers "Varied from token 
to token an increase of 26 percent). We can conclude that the efficacy of the 
/p-p/ context in aiding vowel identification is directly involved with specifi- 
cation of vowel identity. 

42 • ' . ■ — 

47 



• A vowel-by- vowel analysis of the identification errors for the four experi- 
mental conditions is presented in Table 1. (Confusion matrices for the /p-p/ 
and tests are presented in Appendices A-1, A-2,'A-3, and A-4.) It is read- 
ily apparent that for every vowel category, in both Mixed and Segregated Talker 
conditions, there were more errors for the isolated vowel than for the corre- 
sponding vowel in the /p-rp/ fram^. This is strong evidence that th^ lack, of 
familiarity with a talker's vocal tract is far less detrimental to accurate per- 
ception of vowels than is the absence of information provided by a consonantal 
environment. 



TABLE 1: ^Exp^riment I: Identification errors (in percent) for each intended 

'vowel in four experimental conditions. Error rates excluding /q/t/o/ 
confusions are given in, parentheses. (See Footnote 1.). 



Intended Vowel 



Segregated Talkers 



Mixed Talkers 







/p-p/ 


//-// 


/p-p/ 


i 


16 


< 1 


26 


1 


1 


14 - 


4 


23 


2 


e 


46 


12 


62 


27 ■ - 


ae 


. 26 


2 


48 


■ 19 


a 


64 (19) 


23' (4) 


61. (32) 


20 (10) 


o 


29 (14)- 


18 (2^ 


30 (10) 


27 (3) 


A 


42 


8^ ' 


•63 


15 


O 


29- 


18 


49 


39 


U 


■ ; 14 


< 1 


23 • 


. 3 


. errors 


31% (25) 


9% (6) 


43% (38) 


17% (13)- _ 



- • The data reveal differenees in the identif lability of particular isolated 
vowels. The pattern of errors is quite similar to that found for medial vowels; 
the vowels /i/, /i/, and /u/ are most accurately identified^ while the more cen- 
tral vowels yield relatively more errors in identification. %It should be noted, 
however, that even"* the former show^error rates from 14 to 26 percent when they 
are presented without consonantal cpiitext, compared to less than 4 percent 
errors obtained for these vowels in the /p-^p/ context. ^ ' , 



A more detailed analysis was^ undertaken to evaluate the consistency of 
these results. The percent errors obtained' for each* of the 45 tokens on the 
Ijllxed //r// t.est was compared to the percent errors obtained for the comparable 




"The extremely higlT^ew^or rate for the vowel /at is. In part, due tjo the' consid- 
etable confusion betweenVoy ^nd^o/ in the dialect of the talkers. In 
Table 1 the percentages shown in parentheses for these two vowels represent 
the error rates excluding /.a/^/o/ confusions; that is"^ a respo'nse was counted 
cbrrect if the subject identified an intended" /ti/ either as /a/ or as /o/, a^d 
likewise for an intended /p/. Adjtisted" overall error rates also presented in 
Table 1 show. that subtracting /q/-/o/ confusions has llttlfe effect on the? rela-* 
tive differences among the four conditions. ^ ' 

43 



43 



token on the Mixed /p-p/ test. Isolated vowel tokens were misidentified more 
often than medial vowels in 39 out of 45 cases, while two pairs produced an 
equal proportion of errors. In^only four cases did the /p-p/ token produce 
more errors than the comparable isolated vowel. Thus', we can conclude .that the 
difference in error rates found between performance on medial and isolated vow- 
els is consistent across individual tokens of the vowels as well as across vow- 
el categories. 



The overall results of the Segregated tests show that isolated vowels were 
identified far less accurately than were medial vowels, even when talker varia- 
tion was absent. Error rates for the man, ^ woman, and child on the Segregated 
//-// tests were 33, 26, and 32 percent, respectively. Comparable error rates . 
for the Segregated /p-p/ tests, reported in Verbrugge et al. (in press), wera 
9, 6, and 11 percent, respectively. The differences show a relatively constant 
advantage af consonantal, environment for all three talkers, despite some vari- 
ability in overall intelligibility of the^ talkers. 

In summary, it is clear chat consonantal environment contr lbuieb' in a — ' — " 
major way to the identification of vowels. We reach this conclusion whether we 
regard the data iti terms of overall results^ the results' for particular Vowel 
categories, for ^individual tokens, or for individual talkers. Isolated vowels 
are much niore poorly identified than vowels embedded in the /p-p/ context;' 

Acoustical Analysis 

The results of this experiment-^dicate that isolated^ steady-state vowels 
are poor stimuli from the standpoint of the perceiver. The possibility tremains , 
however, that the perceptual problem in identifying isolated vowels is 1 result 
of the way the talkers produced thpm. Phonetically untrained talkers may be 
unable to produce specified tokens of vowels reliably in isolation. Acoustical 
analysis of the vowel utterances by our panel of talkers was undertaken to in- 
vestigate this possibility. 

Center frequencies of the first three speech formants and the duration of 
the vocalic portion of each gfyllable were determined from spectrograms and 
spectral Sections produced on a Voiceprint Sound Spectrograph. Recordings of 
tokens produced by women and children were reproduced at half-speed for spectro- 
graphic analysis; obtained frequency va;.ues were doubled to determine the 
actual fomaht frequencies of these tokens. .Spectral sections were made at the 
point of nearest approach to^ the steady state. (If the^-^owel was diphthongised 
by the talker ,' measurements were obtained from the initial part of the vocalic 
portion of the syllables.) Two judges, working independently, determined the 
center frequency values for the sp.eech fbrraants to the nearest 25 Hz. Frequen- 
cies reported represent an average of the values obtained by the two judges. 
In addition, measurements of the duration of the f irst-formant periodic energy ' 
were made.^ 



"For many isolated vowels* and some vowels in /p-p/ frames, the offset of peri- 
odic energy preceded, offset of higher formant energy considerably. However, 
the rank order of vowels within each listening condition .was the same even 
when the duration of higher formant energy was considered. tWs, the conclu- 
sions discussed in the 't^xt are valid for both measures of duration. 

44- 



Measurements weriB obtained for the 45* tokens of the Mixed Talker /p'-p/ test 
and the 45 isolated vowel tokens in the Mixed test. In addition, measure- 
ments were obtained for the reinaining six isolated vowels spoken by each talker 
that were not* iricorporated in the Mixed //-// test. Thus, one token of each of 
nine isolated vowels was measured for each of 15 talkers. For the Segregated 
tests, one token of each of the nine isolated vowels was selected randomly from 
each af the three talkers' tests. For comparison, the /p-p/ token that corre- 
sponded to each selected isolated vowel was also analyzed. 

. ^ : ^ 

TABLE 2: Average frequency values (in Hz) for the first three speech foiftaants 
of the nine isolated vowels, averaged over five talkers in eaG!T,>3rQ.up. 



^2 





i 


I 


t 


ae 


Q 


3 


A 


u 


u 


M 


355 


447 


635 


737 


757. 


672 


685 


497 


3^7' 


W 


385 


482 


747 


, 82f 


843 


692 


815 


577 


. 435 


c 


357 


580 


755 


• 8^ 


1030 


770 


895 


557 


500 


M 


2245 


1960 


•1790 


1697 


1220 


942 


116? 


1092 


1042 


W 


'2792 


2325 


2157 


2110 


1372 


1312 


1525 


1399 


il75 


c 


3335 


2710 


2485 


2685 


1565 


1350 


1630 


1340 


-1150 


M 


2937 


2575' 


2510 


2445 


2347 


2453 


2307 


2352 


2165 


W 


3482 


3060 


2960 


< 2900 


2915 


2875 


2847 


2815 


2735 


c 


3880 


3630 


3765 


3680 


3700 


3540 


3725 


3613 


3150 



Looking first at the analysis of the isolated vowels spoken by the full 
panel of talkers, we can ask whether the poor identification (43 percent errors) 
was due to the talkers' inability to produce isolated vowels reliably. Table 2 
presents the average values of the first three speech formants for the men, 
women, and children. In Figure 2 the average values for the first and second 
speech formants are plotted in a two-dimensional "vowel space." On the average, 
our talkers' productions of the vowels in isolation were/ systematic in distrib- 
ution and carresponded closely in formant values to vpwfils sampled by" other 
investigators, (Peterson and Barney, 1952; Tiffany, 1959; St;evens and House,' 
1963). The formant frequencies showed systematic elevations from men to women 
to children, reflecting a general decrease in the size of these talkers' vocal 
tracts. ^ ' 

Individual tokens of isolated vowels corresponded closely to values reported 
in previous studies except for tokens of the vowel /o/ by all talkers, tokens of 
/e/ spoken by the men and women, three tokens of /ae/ spoken by children, and one 
token of /u/ spoken by a wpman. The deviation in /o/ tokens represents'' a dia- 
lectal difference between .our talkers and those recorded by Peterson. and Barney 
(1952). Stevens'and House (1^63) did not report data for this vowel. 

The next question of interest is whether the panel's productions of iso- 
lated vowels differed greatly from their corresponding productions of vowels in 
/** the /p-p/ consonantal frame. To answer this question, we compared the tokens 
actually used in the two Mixed Talker tests. Figure, 3 presents the average 
values of Fi and F2 for, the medial vowels and isolated vowels, .pooled across 



45 



AVERAGE VALUES FOR ISOIV^D VOWELS 
(FIVE TALKCTS/GROUR) 



3S00 



3000 



I 

"JT aooo 

2 
< 

o 

uu 
O 

0 ,000 



500 




Wornvn 
Chiidrvn 



aOO 300 400 500 *00 700^ aOO too 1000 1100 

FIRST FORMANT (Hz) 



i 



Figure 2: Average Formant l/Formant 2 values for isolated vowels spoken by 
men, women, and children (five talkers in each group). 



men, .women, and children. The vowels on the two tests occupied almost the same 
area in Fj^/F2 space. The second formant of the medial vowels showed a slight 
migration toward the center of the space. This is an expected result of coar- 
ticulation (where formants fail to reach a steady-state target) and is in 
accord with results reported by Stevens and House (1963) for vowels produced 
between consonants with labial and labio-dental place of articulation, 
Tiffany (1959) noted; this reduces the a^coustic contrast among vowels spoken in 
a consonantal frame in comparison tp isolated vowels. However,^ the perceptual 
data demonsttate that identif iability cannot be predicted from the spread of 
steady-state formant measurements; medial vowels were perceptually much more 
distinct than vowels in isolation (83 vs. 57 percent correct identifications). 

The two sets of vowels were very similar in formant frequencies, in both 
the central tendency and the variability of values for each vowel. Even so, 
there were a few^ individual tokens that deviated markedly from the central ten- 
dencies. It is of interest whether the considerably greater error rate for iso- 
lated vowels over that pbtaihed for medial vowels can be attributed primarily to 
the misidentification of tokens that were produced in a deviant manner. 

One way to answer this question is to, look at those pairs of tokens that 
Contributed most to the difference obtained in the perceptual tests. For nine 
comparison pairs ^ errors for the isolated vowel exceeded those for the medial 
vowel by more than 50 percent of the opportunities for error. It might be sup- 
posed that the formant frequencies of these isolated Vowel tokens would show 
the greatest deviation from the averstge values and from values for the comparable 

•46 • ' ■ 



I 



AVERAGE VALUES ON P-P ANO #-# 
MIXED TALKER tests' 



3500r 



3000 - 



asoo 



H 

r 



3000 - 



S5 1300- 



o 
z 

8 

Ui 

l/> 



soo - 




p-p 

#. t 



300 400 SOO 400. < 700 too 900 1000 1100 

FIRST FORMANf(Hz.) 



Figure 3: Average Formant l/Formant 2 values for vowels in /p-p/ syllables 

(solid lines) and vowels in isolation (dashed lines). Values w6re 
computed over ^the five tokens of each vowel in each Mixed Talker 

test. . ^ ' ^ 

. . * 

r 

medial vowel. This is not the case, however,, as may be seen from Figure* 4, 
which shows tha nine vowel pairs. For some^of these pairs, the first- and 
' second-f ormant values for both the isolated ajvi medial, vowels fell within the* 
range of variation for the appropriate vowel category. For the vowels /ae/ , 
/a/, and /i/. both isolated and medial vowels were displaced from their ^typical 
positions. Finally r for the vowels /u/, /e/> and 'one pair of /a/, fhe isolated 
vowel might be considered less conf usable acoustically .than its counterpart in 
medial position. Thus> there^ seems to be no close correspondence between per- 
ceptual confusability and acoustic deviation from some expeqted (target) value. 

This does not mean, of course, that variations in fonaant frequency posi-/ 
tions have no effect on perception. There were a few pairs of tokens that weye 
"misarticulatedi' on both the /p-p/ and //-// tTests and that contributed -relatively 
greater numbers of errors in identification. (For example, one woman's produc- 
tion of 1x3 1 was quite deviant on the medial vowel test, as well as on the iso- 
lated vowel test. Listeners made 38 and 100 p'^rcent errors on the isolated and 
medial tokens, respectively.) However, with respect to the present comparison, 
the sali^t ppint is that deviation in formant structure cannot account for the 
large and^ cons is tent differences 'between perceptual tests of isolated vowels 
and vowels in a fixed consonantal frame. 

Measurement's of formant frequencies of tokens from the Segrega!ted Talker 
tests corroborate the results for the Mixed Talker tests. Since measurements 
were made for only a sample of the total set of items, we cannot be sure that 
deviations in the production of isolated vowels were not responsible for their 
inferiority as perceptual targets. However, the tokens that were measured gave 

47 



ERJC 



K-9 



Nl 

X 



3500- 
3000- 

2500" 
2000 



< 

cr 
O 



1500 



z 

O 1000 

o 

UJ 
CO 



500 



V- Isolated Vowel 
Vp-Medi^l Vowel 



Ap. — A' A 



200 400 600 800 1000 ' v 

^ FIRST FOpMANT (Hz) 

Figure 4: Fonaant 1/Formant 2 values fp'r the rtine pairs of ' vowels on the 
Mixed Talker tests that conttibuted ^most to the difference in* 
identification errors. Vowels in /p,-p/ syllables are indicated 
hy the subscript £• - ' 



no indication, that the three talkers produced the isolate;! vowels less consis- 
tently than they did the medial vowel?. A comparison of pairs of tokens showed 
that isolated and medial vowels were similar in all but a few cases. Deviations 
,from the normal range of formant values 'were as likely to be obtained for a 
randomly selected medial vowel as they vere for a i^andomly selected' isolated 
vowel. Thus, the consistent advantage found in perceptual tests for medial 
vowels over isolated vowels, for all three talkers and all nine vowel categor- 
ies, cannQt be attributed to tjeviant formant frequencies of isolated vowels. 

While there was no indication of large differences in the formant structure 
of the vowels in isolation and those in syllables spoken in citation form, these 
two sets of tokens did differ considerably in terms of overall duration. 
Table 3 gives the average duration of the voiced first formants of isolated and 
medial vowels in Segregated and Mixed Talker tests. The isolated vowels were 
much longer on the a\;erage than were the medial vowels. However, a more impor- 
tant consideration is the relative durations of the vowels in the two^sets. 
More specifically, are the relative durations of isolated vowels different from 
those typically found for vowels in consonantal context? 

The relative durations of vowels in /p-p/ frames were similar to the values 
reported by Peterson and Lehiste (1960) and^House and Fairbanks (1953). The 
vowels, /i/, /e/, /a/, and /u/ were the shortest in duration; /i/ and /u/ were 
intermediate; and /a/, /o/, and /ae/ were the longest vowels. The only exception 
to this in our data was the vowel /u/ in the Segregated /p-p/ test, fpr which 
the average duration was considerably shorter than that reported by other re-* 
searchers. • 



48 



TABLE 3: Experiment I: Average durations (in msec) of the VocAlic portion of 
tokens in four experimental conditions. Asterisks indicate* deviant 
lengths (see text) . 





r ^ — 

Intended Vowel 


Segregated Talkers^ 


Mixed 


Talkers^ 






/P-P/ 


//-// 


/p-p/ 


X ^ 


315 


128 


326* 


148 


I 


229^ 


108 


198 


138 


z 


226 


111 


• 245* 


136 


ae 


328 


194 


256 


204 


a 


313* 


179 


237 


177 


0 


303* 


186 


251 


18& 


A 


246 




184- 


138 


U 


^242, 


124 


259* ' 


131 


• 

U r-^ 


•• .311 


109* 


237 


- 159 


Overall errors 


(■ 


139*. 4 


243.7 


157.4 



^Averages based on three randomly selected tokens of each vowel^ 'one from' 
each of the three falkers. • » ^ ' 

\ 

^Averages based on five tokens, each spoken by a different 'taljcer. 



As Table 3 indicate's, relative- durations for the isolated vowels were'simi- 
lar to those for "medial vowels' with the following exceptions: ior the Mixed 
//-// test, the vowels /i/, /©/, and hi showed longer relativp durations than, 
they did in consonantal context. For the Segregated test, the vowels /a/ and 
./d/ ^sliowed shorter relative durations than their counterparts in consonantal 
frames. • 

The atypical durations of these isolated vowels cannqt account for the 
consistent advantage of medial vowels over isolated vowels for every vowel cate- 
gory in the perceptual tests. Even for the deviant vowels, *the confusion; pat- 
terns showed no consistent trend toward responses that would be predicted on ^ 
the basis of the deviant durations* (See. Appendices A-3 art^i A-4 for confusion 
matrices.) \ 

Discussion 

In this study Ve found that vowels produced in a fixed consonantal environ-: 
ment were identified with much greater accuracy than were comparable steady-- 
state vowels produced in isolation. This was true both when variation due^o 
talker differences was present and when .it was not.^ Thus, the experiment pro- 
vides no evidence that coarticiilated consonants facilitate .Identification by^ 
enabling the listener to recalibrate for each new talker. Coarticulated conso\ 
nants are integral 'to the specification og, vowels whether a^talker is familiar 

• * 

or not. J 

* , , 

■^It has been suggested that the relatively poor performance on the isolated • 
vowels might be due to the lack of correspondence between the stimuli and the ^ 



Acoustical analyses were undertaken to investigate the possibility that 
untrained talkers fail to adopt consistent targets for vowels in isolation'i re- 
sulting in a highly unreliable signal for perception. Although there were sys- 
tematic acoustic differences between vowels produced ^n consonantal environme^it 
and those produced in isolation, the large and consistent increases in confus- 
ability among isolated vowels over those obtained for medial vowels could not 
be explained by increases in the "acoustic similarity ^of vowel- categories when 
defined by formant frequencies. Nor could these differences be attributed to 
differences in the relative durations of the vowels in isolation and in context. 
It iB Unter^ting^to note* that medial , vowels tend to be more sifSll&r to'^each 
other than compsCfable isolated vowels in terms or the cross-sectional acoustic 
parameters thfit have traditionally been used to . differentiate vowel classes. 
This is additional support for the view that static descriptions of vowels are 
inadequate for capturing perceptually relevant aspects of the acoustic signals. 
Our results lead us to conclude that the acoustic information for vowel iden- 
tity, like that for consonants, is* specified in the 'dynamic configuration of 
the Syllabic pattern as a whole. 

♦ 

In this study,"* the consonantal environment in which the vowels were pro- 
duced was cpnstant across »ali tokei^s. Thus^ the listeners knew beforehand the 
identity of two of the .three phonemes in each test token.* It is possible that 
this knowledge (rather than the presence of formant contours) was the source of 
superior identification for medial vowels*. It would be of limited interest if 
co^nsonantal environment aided in vowel identification only in this circumstance, 
since it is hot generaj-ly the case that listeners have advance knowledge bf 
consonantal identity in natural listening conditionst We therefore undertook 
an additional experiment to test the effects of a varying consonantal environ- 
ment on the identification of medial vowels. 

EXPisRIMENT II: PERCEPTION OF VOWELS IN CVC SYI.LABLES 

V ' ■ ' . ^ . ' ' 

We wanted to determine whether a consonantal context that, varies from 
trial ,to trial Tand is therefore unpredictable by the listener) provide^ 

^ ' \ — 

orthographic representation of the alternatives provided on the rjasponse forms. 
For both /p-p/ and //-// ponditions, subjects were required t'o respond by select- 
ing the appropriate ,/p-p/ syllable, for example, pe.ep and pip." Thu6, subjects 
in the* //-//. condition had to "decdde" the orthography to match the isolated 
vowel, whereas subjects who heard medial vowels had only ^o match the ortho- 
graphic syllable to the perceived syllable. Since the preparation of this 
manuscript, we have used different response forms for both /p-p/ and' //-// 'tests. 
The symbols on the 'response forms corresponded to vowels in isolation, for 
examjple, EE> IH, and EH, and subject^ were giveri practice to make sure they 
could use the S3niibols appropriately. Results of these studies, when compared 
to those from condition^ using *th6 syllable response alternatives, showed no 
diffeirenc^ in performance for the isolated vowels. On the other hand, errors 
for* vowels in /p-p/ syllables were somewhat greater V7hen we used the^ isplated 
vowel symbols. However, identification of medial vowels was s^till significant- 
ly better' than for isolated vowels. ' Further studies of the effects^^of differ- 
ent r^ffponse forms are underway and will be reported in a subsequent article. 
We feel quite confi/lent that the large and consistent difference3 found in the 
present study were due primarily to .perceptual effects. 



important information for vowel identification. We again included conditions 
where the talkers varied ifrom trial to trial (Mixed) and where the same talker 
produced all tokens on a particular test (Segregate4) , in order to investigate 
the possible interaction between talker variation ^ari2l ^knowledge of consonantal 
context. 

' Metfiod . ' . . " 

Stimulus materiaCs ,^ The d-C test syllables were composed from six stop 
consonants, /p, t, b, d, g/', and the nine vowels used in Experinlent I, A 

* panel of four adult tna^es, four adult females, and four children (a subset of 
the 15 talkers used in; Experiment I) each produced six ^kens for the Mixed 
Talker condition, resulting in a test -series of 72 syllables. - Within this 
series, e^ch vqwel occurred 8 times and each initial and fin^l consonant occur- 
red 12 times. Consonartts and vowels were paired such that each vowel .was pre- 
ceded and followed by each cpnsonant at least "once. (Both symmetrioal'and nbn- 
symmetrical pairings were used; for example, syllables suq.'h as /t-t/ and /d-t/ 
both appeared in the test series.) The assignment of ^syllables to talkers was 
j^andotii with the constraint that a talker did not produce the same vowel more 
than once, nor the sajne initial consonant more than twice. 

The talkers read the test syllables from catds on^ which they were printed, 
in standard English orthography, except in cases where no unambiguous English 
spelling existed. For these items, key words were provided beneath the test 
syllables to indicate that pronunQiat^.bn of the vowel. All test stimuli were 
"recorded using the equipment and procedures described in Experiment I. 

The 72^ test ^syllables were arranged in an order of presentatiori with the 
following restrictions: (1) the same intended vowel Sid -not occur more than 
twice consecutively, (2) there was an equal number of tokens oif each intended 
vowel in the first and second half of the test, (3) the same initial consonant 
^*^did not occur mpre tHan twice consecutively, (4) tokens produced by the same, 
talker were separated By not less than six tokens, and (5) each talker occurred 
equally often In the first and second half * of fhe series* For the Segregated 
Talker tests, the same three talkers were recorded as in/ the Segregated tests 
in Experiment I. Eaph talker recorded the entire list of 72 syllables in the 

• same order as for the Mixed Talker tesf. 

Procedure . Listening tests were administered to small groups of subjects 
using the equipment and procedures described in Experiment I. Listeners re- 
sponded on scor^ .pheets printed with columns of key letters r.epresenting each 
of ; the nine vowels. Above each columi\, key Words containing these letter^ 'were 
printed as follows: "si^n .sum s^nd seen sh^ '^se^nt soon s^w stifiiild." 
The, key letters in the columns were preceded and followed by blank lines. ^ Be- 
fore the listening test, the experimenter pronounced each key word followed by 
its vowel in isolation. Special attention was' drawn to the key letters' that 
represented the vowel /u/. ♦ - 

Subjects' in the Mixed Talker condition were required to identify only the 
"vowel in each syllable. „ They did'^tHis by circling, for each syllable, the key 
letter(s) that symbolized the perceiyed vowel. Listeners heard the entire test 
series twice for a total of 144 judgments per subject. , . ' ^ 



51 



56 



\ 



ERIC 



Three groups of subjects were tested In the Segregated Talker condition. 
Air three groups were required to identify only the vowel in the syllables^ and 
tKejr did so in the same way as the subjects in the Mixed Talker condition. As 
in Experiment I, each group of subjects heard the^ thre6 talkers^ in one of three 
orders: •MWC, WCM, or CMW. * Again, data for only the first two tests were ana- 
lyzed, making the number of judgments per subject ecjual to that fpr the Mixed 
Talker tests (i.e., lAA judgments per subject).- Subjects in all conditions 
were told that some of the test syllables were real words and that some were 
nonsense syllables, but th^at they were to ignore meaning and respond o'nly on 
the balsis of the sound of the syllables. * . 

Subjects . All subjects, were paid volunteers obtained from undergraduate 
psychology courses at the University of Minnesota. All were native speakers of 
English and most were natives of the upper midwest region. Twenty-two subjects 
served in the Mixed Talker condition. Twenty-four subjects were tested in the 
Segregated Talker condition, eight with, each of the three counterbalanced, 
orders* • <i 

Results and Disucssion 

Table 4 presents the overall error, rates for the two conditions of thi3 
experiment along with the results of Experiment I for comparison. There was no 
significant difference between the error rates for the Segregated Talker condi- 
tion (22.9 percent) and the Mixed Talker condition (21.7 percent) [t(44 df) = 
0.43]. 

TABLE 4: Overall Identification errors (in percent) for Experiments I and II. 



Segregated Talkers Mixed Talkers 

/p-p/ Test 9.5 17.0 

//-//. Test 31.2 42.6 

Experiment II C-C Test 22.9 ' 21.7 , 



Experiment I 



The major question ^of interest was whether consonantal context aids vowel 
identification even when the context is unpredictable. The results for the C-C 
test syllables may be compared with those found in Experiment I' for /p-p/ sylla- 
bles and isolated vowels (cf. Table 4). tor the Mixed Talker condition, vowels 
in CrC syllables were identified with Significantly greater \accura^cy than were 
comparable isolated vowels, as tested by a median test: (1 ^f) ^ 18.24, . 
£ < .01. The overall error rate of 21.7 percent for C-C syllables was not sig- 
nificantly greater than the 17 percent errors found for vowels in /p-p/ sylla- 
bles [x2 (1 df) « .23]. Thus, the results for the Mixed Talker condition are 
clear; both fixed and variable consonantal frames produced a dramatic improve- 
ment in vowe^l identif lability in contrast to •isolated vowels. The advantage of 
*a consonantal environment obtains even when the identity of the^ consonants is 
not known in advance by the listeners.^ * % * ^ - * 



4 

It is worth noting that tokens, by the^ subset of ll talkers used in, the C-C 

test yielded 20 percent errors on the* /p-p/ test: Thus, if anything, errors 

52 ■ ' ^ 



57- 



The overall results for the Segregated Talker condition were less conclu- 
sive* Vowels in OC syllables ^eire, on the average, better identified than 
isolated vowels: (1 df) « 6.08, £ < .02. However, unlike the Mixed Talker 
results, listeners did not identify vowels in C-C syllables as accurately as . 
vowels in /p-p/ syllables [x^ (1 df) « 25.6, £ < .01], The error rate for ^the 
Segregated C-C test appears. to be idiosyncratic in that there was no advantage 
oyer^ the comparable Mixed Talker condition, /For the /i^-p/ atid t^sts, the 
advantage of Segregated test over Mixed test was 8 and 12 percent, respectively*.) 

Table 5 presents the errors for each vow^l category in the two C-C; condi- 
tions. (Confusion matrices are given in Appendices A-5 and A-6.) Results. for 
individual vowel categories in the jlixed Talker cohdition (right-han4 column) 
verified the pattern found for overall errors. In comparison with the d.ata for 
the Mixed test (Table 1), vowels of each category, with the exception of 
/D/,.were identified with greater accuracy When they were spoken in a variable 
consonantal frame than when they were spoken in isolation. * 



TAJbLE 5r Experiment II: Identification errors (in percent) for each intended 
vowel in two experimental, conditions. 



Intended Vowel Segregated Talkers Mixed Talkers 

i 8.6 

I 12 ' 17 

e 14 ^ ' ^ 24 . 

ae 13 . ^ 15 

.a 41 (15) 31 (7) 

? 44 (10) 37 (11) 

A ' ' 11 .18 

u " 46 39 

u ' 17 8 

Overall errors 23% (I7) ' 22% (16) 



Results for individual vowel categories in the Segregated Talker tests 
(left-hand column) showed an unexpectedly high error rate for back vowels, /a/, 
/o/, /u/ , and /u/, for all .three talkers. Errors on these vowels account for 
the lack of an overall advantage in the Segr^^gattS condition over the Mixed 
condition with C-C syllables. We currently have no explanation for this result. 

.The results of this experiment support the claim, that consonantal context 
af3s in the specification of vowel identity by providing important acoustic in- 
formatipa to.'the listener. Even when the consonants are not known in advance, 
listeners are much more^accurate in identifying medial vowels in CVC syllables 
than they are in identifying isolated ste'ady-state vowels.^ The acoustic 



in the C-C study are probably overestimated relative to the results one might 
expect for a test including all 15 talkers*. 

In a separate study, similar reaujts were found when subjects were asked to 
identify both the consonants and the vowel in each test syllablen^ Errors in 

. • . • ^ ' 53 



53 



effects of coarticulation carry Substantial information about a medial vowel, 
which aids in vowel'' identification whether or not the listener has prior knowl- 
edge of^the consonants'- identity.^ 

SUMMARY AND CONCLUSIONS 
' . . - • • . , • , / ' . 

In Experiment I, perceptual tests of vowels producted ift isolation and in a 
fixed CVC context by the same talkers demonstrated that providing a consonantal 
environment increases t;he likelihood of correct' identificat^ion of the^L^e^i 
vowel. This was true both when talker variation was present and when it was 
rtot; the advantage o'f consonantal context was independent talker variation. 
Of the two factors investigated, consonantal context was much more important 
than talker variation in determining listeners' identification of vowels. The 
increment in error for isolated vowels in comparison to the medial vowels was 
more than three times greater than the incnament attributable to unpredictabil- 
ity of talker. 

We considered what might account for the difference in intelligibility be- 
tween vowels in /p-p/ environment and in isolation. We conclude*^ that the poor ' 

— ^ ' .. I : 

vowel identification averaged 29 percent. Thus,*even with the additional task 
of identifying the consonants, error. rates were substantially lower than when 
listeners were required to identify vowels in isolation. 

> 

6 ' 
Two aspects of the design of the C-C ^ests make further interpretation of the 

results problematic. First, although each consonant appeared fequally often,, 
the occurrences of consonants in initial and final' position were not balanced 
across vowels, rtor were equal numbers of consonants contributed by different 
talkers in the Mixed test. As xesult, we cannot make precise statements 
about the. relative advantages of fix^d and variable contexts, about the inter- 
action of, context with talker variation, or about the relative- effects of dif- 
ferent consonants on the identif iability of coarticulated vowels. A second 
problem qon^erhs a possible interaction between vowel categories and prior 
familiarity with particular test items. Many of*the*C^C syllables are words 
that are familiar to the listeners. If thi5 factor has a major effect on the 
perception of vowfels in tasks like ours (in spite of the closed response set 
and the instructions to ignore meaning), the superior recognition of C-G sylla- 
bles might have little "to do with the type of acoustic information made avail- 
able. If so, one might eocpect that listeners would do far better on syllables 
that formed words .than oh those that^were nonsense syllables. Of the 72 C-C 
syllables included in the present experiment, 38 were English words. The ovet- 
ali error rate* for these tokens in^ the Mixed Talker test was 15 percent, com- 
pared to a '25 percent error rate for the 34 remaining C-C syllables. While 
this suggests that' linguistic experience is a factor in'^ vowel ' identif ication 
under these conditions, two further observations should be made. First, both 
error rafS^ are well below that obtained for isolated vowels. - Thus, if experi- 
ence is a factor at all^ it is probably secondary to the presence of phonetic 
context. Second, the error rates for the real words and nonsense syllables are 
diffici^lt to interpret, since the fraction of C-C syllables that are real words 
varies with different vowel categories. The analysis is, further complicated by 
intrinsic differences in perceptual difficulty among the nine vowels and by 
differences among the C-C syllables* in orthographic representation. 

54' * ^ • ^ 



GO. 



f 



•intelligibility of isolated vowels could not be attributed to the tJikers' fail-, 
ure to produce these vowels in a consistent manner or to their adoption of 
aberrant formant frequencies. Measurements showed that formant frequenc:^ values 
and .relative durations of isolated vowels were, generally quite similar to those 
of vowels in the consonantal frame. The relative intelligibility of a token 

"cannot be estimated very precisely from its position in the space defined by the 

two formants; a fact also, noted by Peterson and Barney (1952). 
• • « 

The second experiment showed that consonantal context aids vowel identifi- 
cation even when the consonant frame varies unpredictably. Vowels produced in 
randomly , varying .^top-consonant environments were identified more accurately 
than were isolated vowels both when the talker was fixed within a test block and 
when talkers, as well as context, varied unpredictably. 

These results are surely puzzling if one makes. the assumption that target 
, frequencies of the formants aloucncould fully specify the vowels. If that were 
so, an isolated quasi~steady-s^^ utterance ought to be an Optimal signal. for 
perception. It is true that ^^thetic steady-state vowels based an these for- 
mant parameters are fairly intelligible to naive listeners and may be identified . 
quite consistently by experienced listeners (Delattre, 1^51). Moreover, in the 
domain of automatic speech recognition, some success has been achieved with a 
static model of the vowel. Gerstraan (1968) devised an algorithm based on fre- 
quencies of the first and second formants of /h-d/ syilables. recorded from 76 
talkers by Peterson and Barney (1952). Gerstman's algorithm sorted nine vowels 
in this set with only 2.5 percent error, less-than was made by human listeners. 
From such a result, one might infer that target formant frequencies can unambig- 
uously specify the vowels of English as produced by a variety of talkers. 

However,* as we have seen, this conception of the vowel cannot be reconciled 
easily with certain facts of perception. Vowells in isolation were poor signals 
from the perceiver's standpoint, even though talkers adopted targets that dif- 
fered little from those attained iiv citation- form /p-p/ syllables. Thus, we^ may 
suspect that no single cross section through the syllable can' fully specify the 
vowel. This inference is consistent with previous studies in the phonetic lit- 
eratui:e, to which we have referred. It is also relevant,^ in this context, to 
'mention the results of«an experiment by Bond (1975) on perception of vowels 
created by iteration of a single cycle frorn^ steady-state vowel tokens. Percep- 
tion of such vowels by naive listeners was even less reliable than the results 
we obtained for unedited isolated vowels. If target frequencies alone were 
fully adequate to specify the vowels, it is difficult to understand these 
results.^ , 

We are led to conclude that cues that are ordinarily -regarded as consonan- 
tal contribute rcqularly to the perception of the vowel. We suspect that much 
vowel information is .contained in formant transitions, as Lindblom and Studdert- 
Kennedy (1967) suggested some time ago. Whatever the nature of the contribution 
consonantal environment makes to the identification of a vowel, the data we have 
reviewed point to the general conclusion that no single temporal cross section 
of a syllable conveys as much vowel information to a perceiver as is given in 
. the dynamic contour of the formants. From the standpoint of perception, it 

^The implications of the specification of vowels in terms of idealized "targets" 
is explored' further in Shankweiler, Strange, and Verbru^ge (in pr^ss) . 

■' ' * . 55 



60' 



would seem that the definition of a vowel ought to include a specification of 
how the relevant acoustic parameters change over time. While listeners may be 
trained to identify steady-state tokens accurately (Lehiste and Meltzer, 1973), 
there is no reason to believe that the processes involved in this activity are 
the same as those typically us^d for understanding speech in natural' situations. 

Finally, these results may have implications for understanding the vocal- 
tract normalization problem. Attempts to specify vowels across talkers have 
usually taken as their basic data, the formant frequency values of a single 
cross se'ctipn of a syllable. Our research indicates that the human perceptual 
system is ill-equipped to deal with such data. It would seem ftuitful to renew 
the search for invariants across talkers utilizing information defined over the 
time course of at least a syllable. 

REFERENCES - 

Abramson, A. Si and F; S. Cooper. (1959) Perception of American English yowels 

in terms of a reference system. Raskins Laboratories Quarterly Progress 

Report QPR-32 , ^ Appendix 1. * 
Bond, Z. S. (1975) Identification of vowels excerpted from context. J. Acoust. 

Soc. Am. , Suppl. 57 ;'S24(A). 
Delattre, P. C. (1951) The phy^siological interpretation of sound spectrograms. 

Publications of the Modern Language Association of America 66 , 864-875. 
Fairbanks, G. and P. Grubb. (1961) A psychophysical investigation of vowel 

formants. J. Speech Hearing Res. 4^, 203-219. 
Fourcin, A. J. (1968) Speech source inference. IEEE Trans. Audio Electro- 

acoust. AU-16 , 65-67 . 
Fujimura, 0. and K* Ochlai. (1963) Vowel identification and phonetic contexts. 

J. Acoust. Soc. .Am. 35, 1889(A) . 
Gerstman, L\ H. (1968) Classification of self-normalized vowels. IEEE Trans. 

Audio Ele-ctroacoust. AU-16 , 78-80. * 
^ Harris, C. M. (1953) A study of the building blocks in speech. J. Acoust. Soc. 

Am. 25, 962-969. 

House, A. S. and G. Fairbanks. (1953) The influence of cqnsonant environment 
upon the secondary ^icoustical characteristics of vowels. J. Acoust. Soc. 
Am^ 25, 105-113. , - . ' - 

Jones, D. (1956) An Outline of English Phonetics . (Cambridge, England: 
W. Heffer); 

Lehiste, I. and D. Meltzer. (1973) i Vowel and speaker identification in 
natural and synthetic speech. Lang. Speech 16 , 356-364. 

Liberman, A. M. (1970) The grammars of speech and language. Cog. Psychol. 1, 
301-323. 

Liberman, A. M. ,.F. S. Cooper, D. P. Shailkweiler, and M. Studdert-Kennedy . 
5 (1967) Perception of the speech code. Psychol. Rev. 74 , 431-461. 

Lindblom, B. E. F. (1963) Sjjectrographic study of vowel reduction. J. Acoust. 
Soc. J\m. 35, 1773-1781. 

Lindblom, B. E. F. and M. Studdert-Kennedy. (1967) On the role of formant 
transitions ±n vowel recognition. * J. Acoust. Soc. Am. 42 , 830-843. 

Millar, J. and W. A. Ainsworth. (1972) Identification of synthetic isolated 
vowels and vowels in h-d context. Acustica 27 , 278-282. 

Ohman, S. E. G. X1966) Coarticulation of VCV utterances; Spectrographic mea- 
surements* J. Acoust .^Soe. Am. 39 , 151-16B. 
. Peterson, G. E. and H. L. Barney. (1952) Control methods used in a study of 
the vowels. J. Acoust. Soc. Am. 24, 175--184. 



Peterson, G. E. and I. Lehiste. (1960) Duration of syllable nuclei in English. , 
J. Acoiist. Soc. Am. 32 ^ 693-703. . * 

Potter, R. K. and J. C. Steinberg. (1950) Toward jhe specification of sfpeech. ^ 
J. Acoust. ^oc. Am. 22^, 807--823* / " ^ 

Rand, T. C. (1971)* "^ocat tract size norioalization in the perceptib.n ^'f stop 
consonants. Raskins Laboratories Status Report on Speech Research 
SR-25/26 , 141-146. _ ^ 

Schatz, C. (1954) The role of context in the perception of stops. Language ^ 
30, 47-56. ^ . 

Shearme, J. N. and J. N. Holmes. (1962) An experimental study of the classifi- 
cation of sounds in continuous speech according to their distribution in 
the formant 1 - formant 2 plane. In Proceedings of the Fourth International 
Congress of Phonetic Sciences , ed. by A. SovijSrvi and P. Aalto. (The 
Hague: Mouton) , pp. 234-240. 

Shankweiler, D. P., W. Strange, and R. R. Verbrugge. (in press) Speech and 
the problem of perceptual -constancy. In Perceiving^ Acting, and Knowing: 
Toward an Ecological Psychology , edi by R. Shaw and J; Bransford. 
(Hillsdale, N. J.: Lawrence Erlbaum Assoc.). 

Stevens, K. N. and- A. S. House. (1963) Perturbations of vowel articulations 
hy ponsonantal context: An acoustical study. J. Spee^ Hearing Res. 6, 
111-128. ^ ' ,\ ' . ^ 

Strange, W. , R. R. Verbrugge, and D.' Shankweiler. (1974) Consonant environ- 
ment, specif ies vowel identity. ^ Haskins Laboratories Status Report on 
^ Speech Research SR-37/38 , -209-216. ^ 

Tiffany., W. R. (1959) Nonrandom sources of variation in. vowel quality. 
J. Speech Hearing Res. . 2^ • 305-317 . • , 

Verbrugge, R. R. , W. Strange, D. P. Shankweiler, and T. R. Edman.^,Xin"15^"ess) 
What information enable'S a listener ' to map a talker ',s ^ytel space? 
.^ J. Acoust. Soc. Am . [Also in Haskins Laboratories/. Status Report on Speech 
Research SK-45/46 (this issue).] ; 

F > 




.62 



APPENDIX A: CONFUSION MATRICES 



Tables report the frequency with which each intended vowel x was identified 
as response alternative In addition, .summary statistics for each condition 
are provided; the percent error for each intended vowel, the overall percent 
errbr, and the number of listeners (N). 




58 



63 



TABLE A-1: Vowels in /p-p/ syllables: , Mixed Talker condition.^ 



Intended Response . Percent 

vowel iieaeaoAuu None error 

- ■ ■ . ' ; — ' 

• i 188 1 1 1 ' 1.1 

I 187 1 2 1.6 

e 139 47 3 • 1 ' 26.8 

. ae 33 15^^ 2 1 18.9 

a ' ■ 152 19 17 2 ' 20.0 " 

o • 1 46 138 1 4 ■ 27.4 

A 18 5 161 6 15.3 

u . 8 , % 47 116 16 1 38.9 

u 2 3 185 2.6 



Overall percent error = 17.0 percent; N = 19. 



TABLE A-2: Vowel in /p-p/ syllables: Segregated Talker condition.^ 



Intended Response ^ Percen 

vowel i I eaea.oAu u None error 



, i 329 1 • ' 0.3 

I 3 '318 4 • 2 • 2 1 3.6 

e .'1 290 20 4 7 5 ' 3. 12.1 

ae 5 324 1 * , 1.8 

Q 7 255 62' 4 2 22.7 

o •55 '269 2 4 • . ' 18.5 

A 11 ' 9 305 4 1 7.6 

o - • ' ^9 19 272 10 17.6. 

u 1 2 327 • 0.9 



a ' 
Overall percent erro.r 9.5 percent; N f 33. 

.59" 



TABLE A-3: Isolated vowels: Mixed Talker condition.^ 



Intended 










Response 










Percent 


vowel 


i 


I 






a 


0 


A 


u 


u 


None 


error 


4 
1 


119 


30 


6 










1 


4 




25.6" 


r 


2 


124 


19 






3 


is 


. 1 


1 


4 


22.5 


e 


1 


. 2 


61 


64 


2 


6 


10 


5 


3 


6 


61.9 


ae 




2 


51 


84 


3 


10 


1 


6 


2 


1 


47.5 


a • 


1 




1 


20 


62 


47 


21 


2 




6 


61.3 


0 




1 


2 


2 


18' 


112 


17 


6 


1 


1 


30.0 


A 




1 • 




6 


■ 32 


31 


60 


• 22 


4 


4 


62.5 


U 




1 


5 


. 3 




15 


48 


81 


1 


5 


49.4 


U 


2 




1 


1 




7 


6 


16 


124 


1^ 


22.5 



Overall percent -error » 42.6 percent; N. = 16. 



TABLE A-4: Isolated vowels: Segregated Talker condition.^ 



Intended 
vowel 




I 


c 


ae 


Response 
a 0 




u 


u None 


Percent 
error 


i 


251 


3 


1 


r 




1 


1 


6, 


33 


3 


16.3 


I 


5 


259 


21 




1 


^- 3 


3 


1 


4 


3 


13.7 


c 


4 


7 


161 


92 




6 


9 


7 




5 


46.3 


ae 






48 


221 


3 


18 


3 


2 


3 


2 


26.3 


a 






2 


37 


107 


135 


17 


1 


■ 1 




64.3 


D 




1 


1 


12 


43 


214 


19 


6 






28.7 


A 




1 


6 


30 


47 


31 


174 


9 






42.0 


U 






3 


4 


3 


10 


51 


214 


12 


I 


28.7 


u 


^ 8 


1 


1 


3 


1 


2 " 


3 


22 


258 


1 < 


14^0 



Overall percent error =31.2 percent; N = 30. 



60 



r 



TABLE A-5: Vowels in C-C syllables: Mixed Talker condition/ 



Intended 



Response 



Percent 



vowel 


i 


I 


c 


ae 


a 


3 


A 


u 


u 


None 


error 


— . — ^ 

> i 


331 


7 


5 


1 






1 




5 


2 


6.0 


I 


2 


292 


53 


1 






2 


2 






17.1 


e 


3 


20 


269 


31 


2 




21 


3 




3 


23.6 








47 


.298 




7 










15.3 


a 




4 


2 


6 


242 


85 


6 


4 


1 


2 


31.3 


o 


i 


3 


1 


2 


91 


222 


a8 


6 


4 


3 


36.9 


A 






21 


5 


14 


4 


289 


17 


1 


1 


17.9 


u 


1 


6* 


1 




8 


10 


70 


214 


41 


1 , 


39.2 


u 


5 








2 




^ 6 


16 


323 




8.2^ 



Overall percent error =21.7 percent; N = 22i 



TABLE A-6: Vowels in C-C syllables: Segregated Talker condition/ 



Intended 



Response 



Percent 



vowel 


i 


I 


e 


ae 


a 


3 


A 


u 


u 


None 


error 


i 


d54 


2 


17 




1 






1 


5 


4 ■ 


7.8 


I 


4 


339 


35 










1 


1 


4 


11.7 


e • 


10 


21 


329 


13 


1 




1 




1 


8 


14.3 


ae 


2 


1 


28 


333 


2 


7 




1 




10 


13.3 


a 




1 


1 


23 


225 


100 


15 


4 


6 


9 


41.4 


0 




1 




11 


130 


217 


4 


10 


4 


7 


43.5 


A 




**' 


3 


3 


16 


8 


342 


8 




4 


10.9 


u 


2 


4 


2 




10 


1 


53 


209 


91 


12 


45.6 


u 


1 


1 


1 




5 


2 


5 


48 


■318 


3 • 


17.2 



Overall percent error 



- 11,^ percent;" N = 24. 



61 



ERIC 



63 



r 



What Information Enables a Listener to Map a Talker's Vowel Space?* 

+ " |. |i I I I 

Rpbert^R. Verbruege, Winifred Strange, Donald P. Shankweiler , and 

Thomas R. Edman"*^ 



ABSTRACT 

9 

Prior experience with a talker's speech contributes little to 
success in vowel identification. Adult listeners avaraged only 12.9ri 
percent etrors on 15 vowels in /h-d/ syllables spoken in mixed order 
by 30 talkers (men, women, and children), and 17.0 percent errors on 
9 vowels spoken in /p-p/ syllables by 15 talkers. When the /p-p/ « 
test series was spoken by single talkers, errors decreased by less 
than half to 9.5 percent. Experience with known subsets of a talker's 
vowels did not significantly reduce errors on subsequent test tokens: 
following the point vowels^ (/i/, /a/, /u/), errors averaged 12.2 per- 
cent on vowels in /h-d/ context and 15.2 percent in /p-p/ context; 
following three central vowels (/i/, /ae/, 7A/), ertors averaged 14.9 
percent in /p~p/ context. Precursors mainly influenced listeners' 
response biases, rather than facilitating true improvements in vowel 
identifiability. These results did not support the hypothesis that 
point vowels provide listeners with unique information for normalizing 
a talker's "vowel space.** Errors on vowels in, rap;id, destressed /p-p/ 



*A partial summary of these results was presented at: the 87th meeting of the 
Acoustical Society of America, New York, 25 April. 1974 (see Verbrugge, 
Strange, and Shankweiler, 1974; see also Shankweiler,. Strange, and Verbrugge, 
in press). This article is to be published in tfie Journal of the Acoustical 
Society of America (1976) . 
« 

University of Minnesota, Minneapolis; currently at University of Michigan, 
Ann Arbor. 



ERIC 



University of Minnesota, Minneapolis. 
Also University of Connecticut, Storrs. 

Acknowledgment : This paper reports research beg*un during the "academic year 
1972-73 while D. Shankweiler was a guest investigator aj^ the Center f6r Re- 
search in Human Learning, University of Minnesota, Minneapolis. The wojrk w^s 
supported by grants to the Center and to Raskins Laboratories from the^ 
National Institute of Child Health and Human Development, by grants awarded 
to D. Shankweiler and J. J. Jenkins by the National Institute of Mental 
Health, and by a fellowship to R. Verbrugge from the University of Michigan 
Society of Fellows. We wish to thank Kevin Jones, Kathleen Briggs, Robert 
Jenkins, and Mark Jaffe far their assistance in the experimental work, Keith 
Smith for his helpful advice on data analysis, and James Jetikins for his ad- 
vice and encouragement throughout this research. 

[RASKINS LABORATORIES: Status Report on Speech Research SR-45/46 (1976)] 

67 



syllables (excised from sentence context) averaged 23.8 percent. 
Errors jumped to 28.6 percent when point-voweL precursors were intro- 
duced, while presentation of syllables in the original sentences 
reduced errors to 17.3 percent. Sentence context aids vowel Identifi- 
cation by allowing adjustment primarily to talker's tempo, , rather 
than to 'the talker's vocal .tract • 

INTRODUCTION ' - - 

The acoustic structure of speech varies markedly from one talker to another. 
The spectrographic measurements of Peterson and Barney (1952) showed that center 
frequencies of vowel formants vary widely across^ men, women, and children, and 
that considerable variation also exists among talkers of the same sex and age 
group. Similar results were found by Peterson (1961). This acoustic variation 
is attributed to differences in the sizes and shapes of talkers' vocal cavities. 
Since each talker's vowels are 'idiosyncratic in their acoustic composition, it 
has been thought that a listener needs an extended sample of a talker's speech 
in order to identify yoyel tokens accurately. In general terms, sucH experience 
would enable listeners to adjust tp each voice they encounter. 

Instead of supplying typical frequency values for each vowel, experience 
with a voice is thought to result in a Tnore general adjustment to the talker's 
"vowel space." This assumes' that a listener identifies a particular vowel of a 
, given talker in terms of the relation between its acoustic structure and the 
acoustic structure of other, vowels produced by the same person (Joos, 1948; 
Ladefoged and Broadbent, 1957^ Ladefoged, 1967). *The first sample of a talker's 
speech will calibrate ' (or "normalize") the framework to which the listener re- 
fers later vowel tokens for identification. Ladefoged and Broadh^t (1957) 
tested this idea with synthetically produced stimuli and found that the perc'ep- 
tion of an acoustically fixed test word varied predictably as the formant fre- 
quencies of a carrier sentence were shifted up or down. They interpreted this 
result within the framework af. adaptation level theory (Helson, 1948), which , 
assumes that p^rceiv6rs. regularly gauge the range of a stimulus continuum in the 
process of formulating psychophysical judgments. 

^ * * ' 

There have been few explicit hypotheses about how much precursory speech 
from a talker is required for accurate calibration and what phonetic information 
is most effective.. The most common suggestion, dating back to Joos (1948), is 
that the point vowels /i, a, u/ arfe the primary calibrators of vowel space. The 
most recent proponents of this view are Lieberman and his colleagues (Lieberman, 
Crelin, and Klatt, 1972; Lieberman, 1973). They argue that experienc*e with the 
point vowels (or the related^ glides /j, w/) is a necessary condition for accur- 
rate .identification of syllables* produced by a. novel talker. They note that the 
point vowels are exceptional in several ways: (1) they represent the extreme 
positions in a talker's articulatory vowel space, (2) they represent the extremes 
of formant frequency values in a talker's acoustic vowel space, (3) they are 
acoustically stable for small changes in articulation (Stevens, 1972), and 
(4) they are the only vowels in which an acoustic pattern can be related to a 
unique vocal-tract area function GLipdblom and Sundberg, 1969; Stevens, 1972). 
Other vowels are ambiguous unless calibration to a vocal tract has taken place. 

There is little evidence to support the claim of a special' role for the 
point vowels. Suggestive evidence is provided by Gerstman (1968), who developed 
a computer algorithm for recognition of vbwels. Gerstman' s algorithm used the 
' extreme values of a talker's formant frequencies (usually those of /i, a, u/) to 
scale all of the talker's vowels, the algorithm operated on these normalized 
values and classified the vowels produced by the Peterson-n^^d^arney (1952) 

64 



1 



panel with a high-level of accuracy* However, it must be recognized that such 
an algorithm is not a perceptual strategy, but only a logically possible strat- 
* e^gy. there is no eyidence "that human listeners perform the computations found 
in Gerstman^s algorithm (such as scaling formants or computing their sums and 
differences). The results of Ladefoged and Broadbeint (1957) provide no assis- 
tance on^the question of point vowels, since their .study did not systematically 
^ vary the phonetic pontent of the precursory speech^ * 

Morfe generally, there is reason to doubt whet^ner a preliminary nomnalizar 
tion step plays the itiajor role in vowel perception that is commonly attributed 
to it. Remarkably low error r^tes have been found when human listeners identify 
single , syllables produced by human talkers. Peterson and Barney (1952) 'and 
Abramson and Cooper (1959) found average error rates of 4 to 6 percent when lis- 
teners identified the vowels in -h-vowel-d words spoken in random ordjer by a 
group of talkers. The test words were spoken as isolated syllables, and in most 
^conditions .the listeners had little or no prior experience with the talker's 
voices. On the face of It, these low ob'served error rates seem inconsistent 
with arty theory that stresses the need for extended prior experience with a 
talker's vowel space. However, it is dii^f ficultf- to assess the full significance 
^ o£ these findings, since §e^;6eral vowels were substantially more ambiguous than 

^ the mean, error rates would suggest, and the possible 'role of point vowels in re- 
ducing thos^^ambiguities was not explored* 
* " ' " » *^ ^ 

^ i> For these reasons, it is worth investigating what infolnnation listeners 

actually rely upon in natural speech for identifying the vowels produced by a 
j/ariety of talkers. There is currently no consensus about the perceptual prob-' 
lem posed by vowels in* the co*ntext of a single syllable, nor about the informa- 
tion gained during experience with" a voice* In particular, , there is no percep- 
tual evidence that the point vowels^ play a special rol^ as calibrators of a 
talker's vowel .space. The experiments reported here repiTesent a systematic in- 
vestigation of these questions. 

Experiment i; perception of bowels in /h-d/ environment ^ 

^ ' Identifying a vowel in a naturally spoken syllable should be most difffcalt 

when a listener has had no prior exper^.ence with the talker's voice. Thu5^ the - 
need for normalization over several syllables can best be assessed by presenting 
. ^ • listeners with a series of single syllables, each spoken by»a different talker..^ 
The presence of many natural sources of talker-related acoustic variation (ior 
example, differences in age, sex;^ voc&l-tract size,, and characteristic pitch \ 
level) should* ihaximize the di£.ficulty of such a test. These test conditibrfe^' 
- were approximated in the perceptual experiments of Peterson and Barney (1^52) , 
who presented 20 tokens from each of 10 talkers (men, womeft, and children) in 
^ each block of trials, and Abramson *and Cooper (1959), who used 15 tokens- spoken 
'by each 'of 8 adujt talker^. B»th exp.eriments studied vowels in a fixed /h-d/ 
consonantal fran^e. 

Our first experiment also "^sed /h-d/ syllables and addressed two major . 
.J ^issues: (1) the need for extended familiarization with a talker's vowel spadr,*. 
and (2) the possible role of the point vowels as calibrators of that space. 
Compared to earliec studies, a grfeater effort Was ifede.in fhis s4:udy to elimin- 
ate any potential contributidn of familiarity with indivip^lual talkers' voices. 
Thyrty talkers elPfch spoke only three syll^fbSfes -distributed ^throughout the tes^. 
In addition, five diphthongs were added to the ten vov^ls studied by Peterson 

V^- ^ . "^-^ ■ ' . . 65 



ERIC 



69 



* 9 

and Barney in order to make all perceptual alternatives available to th^e listen- 
ers: /i, I, c, as, a, o, a, u, u,^*, ei, ou, ai , au, oi/. 

There were two test conditions in the experiment r The No-Precursor test 
contained a long series of /h-d/ syllables; vowel ' •identity and talker identity 
were unpredictable from one syllable to the ne:^. In th6 Point-Vowel .Precursor 
,test, each /h-d/ test syllable was preceded by a string of three syllables con- 
taining the point vowels /i,.a, u/ spoken by the same talker. The three vowels 
were spoken in a /k-p/ consonantal environment;, thus, the precursor string con- • 
tained real words that were different from ttie test words ♦ The listeners task 
in each condition. was to identify the vowels in the test syllab'les. A compari- 
son of the errors made in the two conditions provides a direct measure of the 
information supplied by exposure to ^ talker's point' vowels. If the point, vow-, 
els serve as primary calibrators , of vowel space, one would expect significantly 
better vowel identification in the Point-Vowel Precursor condition than in the 
No-Ptecursor condition. ' ' 

s 

Method • , . ' * ' ^ 

1. Stimulus materials . Thirty talkers of varying ages, physical sizes, 
and characteristic pitch ranges were selected. The group included 13 men, T.2 
women, and 5 children. All talkers spoke English as their native language, but 
^hey we^e heterogeneous in dialect. , , ' ' " .» 

The talkers were recprded individually in a sound-attenuated experimental 
room with a ReVox A77 stereo tape r^carder and Spher-p-dyne jnxcrophone. Each 
talker recorded the full fist ^of 15 -^test syllables twice, plus two repetitions 
of the precursor string. The syllables in each precursor string were read at a 
fcrate of one per second. The first utterance of each syllable br pil'ecursor string 
was used in the lisjiening test^, unless the talker had clearly mispronounced it. 

The tes,t series f or *each condition contained 90 tqsj^ syllables, presented 
in three blocks of 30 syllables each. Each talker contributed only three sylla- . 
blfes containing 'different Vowels to the test, one syllable to each block. Each 
vowel appeared a total ,of six times, twice within each block* Vowels were 
assigned to talkers randomly. The order qi presentation of syllables within ^ 
blocks was. random, wj;th the following constraints: , (1) no less than ten trials , 
intervened between tokens produced by the same talker in one block and the next, 
and (2) noj vowel appeared more than twice in succession* 



The Point-Vowel Pr^ecUrsor test- was constructed first. Test trials were 
assembled in* the order just de^scribed. *For each trial, a precursor string was . 
rerecorded, followed, by, the appropriate test syllable fot the same J-alker. A 
1-s.ec pause was inser*t6d between the last precursor syllable and the test sylla- 
ble. The same precursor ..string preceded all .three of a talker's test syllab^les. ' 
Peak intensity for each precursor string and test syllable was equalized within . 
0.5 dB as monitored on the VU-meter of the tap6 recorder. A 4-sec intertrial ^ . 
interval was inserted between each test syllable and the following set of pre- 
cursors, and a lOrsec interval was inserted between blocks of 30 syllables. 

the No-Precursor .test wai constructed by rerecording- the test syllables and 
deleting the precursors* Thus\ the two tests contained identical test syllables; 
the order pt presentation, the/ intervals between successive test syllables, and 
the intensity of tTie syllables were all the same. ^ 

' *6 . ^ ^ ' * , ^ . , . > ' 



ERIC 



70 




2, ^Procedure , Tests were presented to small groups of subjects in a quiet 
experimental room via a Crowa CX 822 tape recorder, Macintosh MC40 amplifier, and 
AR apoustic sitspension loudspeaker. The, output^ level was the s.ame for both 
tests, as monitored by a Heathkit AC VTVM placed just ahead of the output to the 
Ipudspeaker. The Level was clearly audible in all parts of the room. Sublets 
responded on score sheets that containec| 15 response alternatives, all written 
out in ft/ll and arrayed in rows as follows: "hood, head, hoed, heard, who'd, 
hide, heed, -^ow'd, hud, hayed, hod^ hoyed, had, hid, hiowed." Thfey were told 
that they would hear "several different talkers." Subjects in the Point-Vowel 
Precursor condition^ere informed that each test word would be preceded by three* 
otW^V words spoken oy the same person, and that listening to those three words 
might help them identify the fourth,. Subjects listened to the full test series 
twice,* for a total of 180 judgment,s per subject, 12 on each intended vowel. 

# •• 

3* Subjects . The" listeners, were 37 paid volunteers from undergraduate 
psychology classes, at the University .of Minnesota. All were native speakers of 
English and most were native to the upper midwest region of the United States. 
Seventeen were subjects In the No-Precursor condition, whij-e 20 were subjects in ' 
the Point-Vowel Precursor condition. 

Results and Discussion 



ERIC 



Errors in vowel ideptif ication ^ere tabulated for each condition. An error 
was defined as a failure to select the vowel intended by the talker: the error 
category included omissions, 'that is, failures to select any alternative. In 
the No-Precursor condition, subjects made an average of 12. 9. percent errors, and 
in the Point-Vowel Precursor condition, subjects averaged 12.2 percent errors an * 
the test syllables. Contrary to the prediction that point-vdwel precursors V 
would substantially reduce' errors, the error rates for the two conditions were , 
not significantly different tt(35) = 0.57]. ' 

The error rate in the No-Precursor condition was somewhat higher than the- 
error rates found in the two 'earlier studies* using /h-d/ syllables. Peterson 
and Barney (1952) reported an overall error rate of 5.6 percent. Their lower 
observed rate may be due Xo the smaller number of re^sponse alternatives in their 
study (10 instead of 15), the smaller numljer of talkers appearing in a particu-. 
lar block of trials (10 instead of 30), and the larger total nuDjber of tokens 
from each talker (20 instead of 6). Abramson and. Cooper' (1959) reported an 
error rate of 4.0 percent in a study involving 15 vowel alternatives and ei^ht 
adult talkers. In contrasb* to the present study, talkers carefully selected 
tokens they considered typical, and the listeners were familiar with the talkers ^ 
(in fact, the group Of listeners included the talkers). In addition, the number 
of talkers in the Abramson and Cooper study was smaller (8 instead of 30) and 
the total' number of tokens from each talker was larger (15 instead of 6) . Thus 
, there- are several possible sources for the higher error rate observed in the No- ^ 
Precursor condition of this study. But whatever the source, it must not "Be ^ 
overlooked that 12.9 percent is a remarkably low error rate for a 15-altemative 
response set, especially if one believes that a single syJLlable from a 'rtovel 
talker is a highly ambiguous entity. 

Though experience with talkers' point vowels did not reduce overall errors, 
it is important to determine whe'ther* the precursors influenced the" perception of 
individual vowels. The" peroeiitage of errors made on each intended vowel is pre- 
sented in Table 1 for each test condition. (Confusion matrices, for these 

• • 67 ^ 

• ^ ■ \ 7.1 ■ • . • • 



0 



conditions are presented in'Tables A-1 and A-2 in the Appendix.) Several re- 
sults are worth tip ting. First, errors tended to be very high on the intended 
yokels /a/ and /o/. Most of these errors involved confusions between the two 
vowels. 'In fact, confusions between /a/ and /o/ account for 39 pertent of all 
errors made by listeners in the No-Precursor condition, compared to 28 percent 
of all errors in Pe^tersoil and Barney ^s (1952) experiment. Thus, the phonetic 
confusion between /a/ and /o/ may have contributed to the higher overdll error 
rate observed, in this study. The degree of confusability is not surprising 
since little distinction is made between /a/ and /of in upper midwest em dia- 
lects; most of the listeners (and many of the talkers) were native to that re- 
gion. The error rates for identifying these two vowels, excluding /a/-/o/ con- 
fusions, are included in parentheses in Table 1. ' 



TABLE 1: Mean percent error "in identification of /h-d/ syllables. 



Condition 



/Intended vowel 


No-precuraor 


Point-vowel precursor 


^ • i ' 


1.0 




0.0 






20.1 




29.6 




c 


19.1 




8^ 9.2 




ae 


12.3 




9.6 




a 


48.5 


(9.3)^ 


,43.3 


(4.6) 


, ' o 


18.1 


(9:3) 


42.9 


(19.2) 




. 14.7 




3.8 




U 


14.7 




18.3 




a 


8.3 


' r 


, 1.7 






0.0 




0.0 




' ' . ei 


2.4 




2.1 




ou . 


12.7 




4.6 




ai 


2.0 




0.0 




au 


16.2 




17.9 




OI 


3.9 




O.Cf 




Overal^ 


12.9 


(9.7) 


' , ' 12.2 


(8.0) 



Parenthesized figures present the mean percent error when 
confusions between /a/ and /o/ are excluded. 



- Sedond, several vowels were identified very accurately, even in the No- 
Precursor condition; This is true for two of the three point* vowels (/i/ and 
/u/), for /37, and for three of the dJLph thongs (/er/,^ /ar/;' and /or/). Low 
etror rates for /i/, /u/, and /S/ were also observed by Peterson and Barney 
(1952). The presence of two point vowel.s in this group verifies predictions 
that. they should be relatively unambiguous (cf. Lieberman et al., 1972), although 
their role as calibrators remains in question. The low error rates for diph- 
thongs suggests that their addition to the response set did not contribute much 
to the higher overall error rsTte in this study. The error rate^for the five 
diphthongs averaged only 6 percent across the two conditions. ^ 

68 • • ' 



Third, and most importantly, there was no consistent pattern of change when 
test syllables were preceded. by point-vox^el precursors. This was true even for 
the relatively ambiguous vowels. .Of the seven vowels showing a greater-than- 
average number of errors in the No-Precursor condition, three showed an apparent 
improvement following precursors (/e/, /.a/, /a/), while four showed an increase 
in errors (/1///0/, /u/., /au/) . Thus, in terms of overall errors on individual 
vowels, there was no consistent support for the hypothesis that experience with 
a talker's point vowels allows a listener to disambiguate troublesome vowels. 

The differences in error rate for individual vowels need to be interpreted 
with caution. Differences in response biases in the two conditions could have 
been responsible for some of the apparent changes in identif lability. That is, 
a vowel could have been correctly identified more often simply because it was 
more popular as a response. One indication of such a response bias is how often 
a vpwel is us6d as an incorrect response to other vowels; when the vowel becomes 
more popular, the frequency of these false identifications increases.' Figure 1 
depicts the results of a preliminary analysis fdr response biases. The horizon-- 
tal axis indicates the change in correct identification (in percent) between the 
Point-Vowel Precursor and No-Precursor conditions. Placement to the right of 
the central vertical line represent^ superior performance in the Point-Vowel . 
Precursor condition compared to that in the No-Precursor condition. Tjie vertical 
axis indicates the change in false identification. (This is defined as the per- 
•centage of vowel tokens incorrectly identified as a particular vowel.) Place- 
ment above the cejitral horizontal line represents a greater frequency of false 
identifications in the Point-Vowel Precursor condition relative to the No-Precur- 
sor condition. 

9 

In this preliininary analysis, "true" improvements^ attributable to precur- 
sors may' be defined by an increase in correct responses, coupled with a decrease 
in false identifications.-^ Of the vowels that were most ambiguous in the No- 
Precursor con^dition, only /a/ showed genuine improvement by £his measure. Sev- 
eral, less ambiguous vowels also showed genuine improvement:* /ae, u, ou, ai, di/. 
On the other hand, a change in Correct identification that corresponds in sign 
with a change in false identification may be- referred to descriptively as a 



It* is important ;:o note Chat the relationship between the scales on the hori- 
zontal and vertical axes is arb^itrary. For example, if a vowel appears in the 
upper right-hand quadrant on a A5° line passing through the origin, this can- 
not be interpreted as an increase *in correct responding that is "perfectly cor- 
related" with the increase in false responding. In FigXires 1, 2, and 3, the 
aspect ratios hav6 been chosen .so that the ranges of values on each dimension 
are given roughly equal weight. It is also important to ^ote that the, differ- 
ences plotted are linear functions of error scores. On either axis, the dif- 
ferences indicate the relative contribution of ^each vowel,, to the overall change 
in percent identification. However, the values 'plotted gfve no dndication of 
the . proportionate change in identification on each vowel. \ For example,, if vow- 
el X increased in correct l4entif icatioii from 50 to 55 percent,,, and vo^?el 2. 
creased from 94 to 99 percient, each woul^ appear along the horizontal axis at 
+5 percent, though the proportionate improvement is larger for 2» The pAmary 
goal of these figures is their heuristic value in visualizing relative direc- 
tions of change in two variables. Choice of the linear transform shqnld not be 
interpreted as a claim about what differences represent "equivalent" changes'^ in 
the recognition system. 



S o 

o, P 
cc < 

JU O 

°- il 

m z 
o m 
-z 9 
111 



< - 



1.0 

0.8- 

0.6 

0.4 

0.2- 

0 
0.2|- 
0.4- 
0.6- 
0.8 
•1.0 



Negative Bias 



Positive Bias 



au 



err 



u 



u 



ou 



"True 

Improvement 



-20 -10 0 10 20 

DIFFERENCE IN PERQENT 
CORRECT IDENTIFICATION 



Figure 1: Chang^ in correct and false identification attributable to /kip, 

kap, kup/ precursors^ (/h-d/ syllables). Each axis plots the differ- 
ence^ between the Point-Vowel Precursor condition and the No-Precur- 
/ sor condition. 



"positive" or "negative bias." Two vowels, /e/ and /a/, showed a clear positive, 
bias, while lily, hi y and /au/ showed a negative bias. The.j3^emaining ambiguous 
vowel /d7 showed no sign of improvement: a large increase in 'false responses 
was associated with a large decrease in correct responses. , , ^ 

The analysis displayed in Figure ,1 cannot indicate which changes are, sig- - 
nificant departures from chance variability, nor can it fully disentangle changes 
.in stimulus identif iability from changes in response biases. The niimber of 
false identifications of a vowel ^ might increase, not because of an increased 
response bias toward 3c, but because the perceptual similarity (conf usability) 
of with another vowel ^ may have increased. Correct and false identification 
scores- for ^ will reflect the combined impact of changes in the similarity of ^ 
to several 'other vowels (some similarities may increase, . wh'ile others decrease) 
and changes in response biases of all ^vpwels concerned. 'Luce'^s Choice Axiom 
(Luce, 1959, 1963) provides one means of modeling these interactions in a corifu- 
^s4.on matrix. The model assigns a similarity, parameter rixy each pairwise com- 
bination of stimuli and a response bias, parameter 3y to each response alterna- 
tive. The combined action o'f these parameters determines* a predicted distribu- 
tion of responses^in the confusion matrix* ^ 

70 • . • 



The Luce model is useful Because it allows one to assess the significance 
of changes in a similarity parameter from one condition to another.^ In the 
present' experiment, any beneficial effect of hearing point-vowel precursors 
should manifest itself in a decrease in pairwise similarity measures (i.e., 
pairwise confusions should decrease) • Of the 105 possible pairwise combinations 
of 15 stimuli, 12 pairs -accounted for 81 percent of the errors in ^he No-Precur- 
sor condition and 88 percent of the errors in the Point-Vowel Precursor condi- 
tion. Similarity measures were determined for each of these pairs, and a 
t-statistic was computed to assess the significance of the difference between 
the measures for the two conditions. Only two of the pairs showed a significant 
change in similarity fol^lpwlng. point-vowel precursors: /a-o/ and /o-au/; both 
were cases of increased*^ conf usability and both involved the vowel lol. This was 
a genuine decrement in performance on /o/, which cannot be attributed to an 
overall change in response biases (as might be expected. .from Figure 1). None 
of the other confusabl^ pairs showed significant changes in similarity. 

.These results have direct implication^ for the six vowels in Figure 1 that 
showed change in the direction of "true" improvement: /ae, a, u, ou, ai , oi/* 
The confusion pairs for which similarity measures were ob.tained include the 
major sources of error for each of these vowels. With one exception, none of 
these sources of error showed a significant effect of point-vowel precursors. 
The exception was the conf usability of /a/ and /o/, which showed a large in- 
crease. (Th6 increase appeared mainly in incorrect lo/ responses to /a/, possi- 
bly due to a contrast between tokens of /a/ in the precursor strings and the 
•test syllables.) In general, then, pven the "true" improvements cannot be inter- 
pre^ted -as anything more tihan expressions of chance variability. 

Thus,, the patterns of error with and without point-vowel precursors were 
similar, showing major differences only in the identification p£ hi. The pres- 
ence of these differences indicates that ,the precursors did have an impact on 
su^jjects' judgments; the nonsignificant difference in overall errors between the 
two conditions cannot be due to inattention to the precursor strings. Even so, 



2 ' ^ ' 

The predicted frequency of identifying an intended vowel x as the response al- 




ternative y, e^i is defined by the formula: 

6 n n 
^ y xy X , 

xy N. 

where N is the number of vowel categories (15 in Experiment I) n^^ is the 
total number of intended vowels that were presented (12 per subject "in Experi- 
ment I). These* "expected values" were estimated for each cell of the confusion 
matrices, using an algorithm developed by J. E. Keith Smith at the University 
of Michigan.,' At ^theoretical liitiit, the procedure outputs the set of maximum 
likelihood estimators for the obsei;ved j)attern of errors. The x-y similarity 
parameters were estimated as follows: = (e^ey^/e^ey^) Since -In r^xy 

closely approximates a normal distribution, similarity parameters for two con- 
ditions raiky be compared using the t-statistic, t » *2(ln n2 In n]^)/(Vj^ + Vi^^l^y 
where V is the estimated varianc^. A full development of this general procedure 

may be "found in Goodman (1969, 1970). * * . ^ 

' ♦ 

. _ • 71 



there is no support in these results for the point-vowel hypothesis; the major 
difi^ferences involved increases in ambiguity and shifts in response biases. 

Perhaps the most striking result is that subjects generally had little dif- 
ficulty identif^l^ing the test syllables, even when there was no prior information 
about talkers' vocal tracts. It Is possible that the level of identification 
was so high in the No-Precursor condition that there was little room for improve- 
ment: 87 percent may represent a ceiling, on identif lability of these' test syl- 
lables under any conditions. Thus the failure to find a precursor effect in 
this experiment might indicate (1) that point vowels do not bear the kind of in- 
formation hypothesized, or (2) that there may be no need for such information, 
if there are no errors that are a function of uncertainties in normalization. 
It is necessary to know what component (if any) of the 12.9 percent error rate 
is due to subjects' uncertainty about the vocal tracts to which they are listen- 
ings This would define the maximum improvement in identification that could be 
contributed by the presence of precursors. The next experiment was designed to 
measure the error component attributable to vocal-tract uncertainty and to re- 
assess the potential value of sample vowels in reducing that uncertainty. ~ 

EXPERIMENT II; THE PERCEPTION OF VOWELS IN ;/p-p/ ENVIRONMENT 

Two ebnditions in this experiment were designed" tc^ measure the error com- 
ponent in vowel perception that is attributable to talker variation. Ip the • 
Mixed Talker condition a large number of talkers spoke a series Qf syllables; bn 
each test syllable the listener encountered a voice th^t was unfamiliar and un- 
predictable. (This condition is comparable to the No-Precursor condition of 
Experiment I.) In the Segregated Talker condition subjects heard the same series 
syllables spoken by one person, ^o there was ample opportunity to become famil- 
iar with the voice and the talker was fully predictable from one syllable to the 
next. The difference between the error rates in these fwQ conditions provides a 
measure of the increTii,ent in perceptual error introduced by talker variation. ^ 

Two additional mixed talker conditions were included to reassess the role 
of precursory information in reducing perceptual errors. In each condition, the 
test syllables of the Mixed Talker test were preceded by a precursor string from 
the appropriate talker. In the Point-Vowel Precursor condition, the precursor^ 
string was /hi, ha, hu/ (/h-/ syllables were chosen to facilitate articulation, 
while minimizing nonvocalic sources of information) . In the Central-Vowel Pre- 
cursor condition, each syllable was preceded by /hi, hae, hA/.3 Aa was argued in 
Experiment I, point-vowel precursors. should substantially reduce errors if they 
are privileged carriers of information for normalization. A comparable set of 
nonpoint vowels should produce little or no improvement in identification, by 
the same hypothesis. Finally, if the information available in point vowefls is 
essentially that gained^ during extended familiarization with a vacal tract, then 
performance in^j:he Point-Vowel Precursor condition should resemble that in the 
Segregated TaSc;er condition. . • 



The term "central vowel" is used only in contrast to "point vowel^^* nat in the 
more restricted sense, found in traditional phonetic taxonomies. Of the six 
central vowels so defined, a set of three with fairly wide dispersion in two- 
formant space wfere chosen for this condition. 



72 



Several changes made in the design of thXs" experiment were intended to in- 
crease the average level of errors beyond that found in Experiment I. First, 
the consonantal context for the vowels was changed from /h-d/ to /p-p/» The 
/p-"p/ environment was chosen because vowel duration tends to be shorter in voice- 
'less stop contexts than in voiced contexts (Stevens and House,, 1963). Second, 
an effort was made to reduce syllable duration and increase coarticulation ef- 
fects by encouraging talkers to speak rapidly when recording the syllables. 

Third, the five diphthongs and 1^1 were eliminated from the vowel set, since 

they tended to produce few errors and would ^be relatively unin formative in the 
present design. 

Method 

1. Stimulus materials . A panel of 15 talkers (five men, five women, and 
five children) was chosen to produce the test syllables for the mixed talker 
conditions. They were selected to represent a wide variety of vocal- tract sizes 
.and characteristic fundamental frequencies. None were- phonetically * trained 
speakers. In the jddgment qf the experimenters, the talkers represented a fair- 
ly homogeneous dialect group, that of the upper midwest region frbm which the 
listeners were also? drawn. 

The Mixed Talker tests consisted of 45 tokens, 5 tbkens of each of the 9 
syllables: /pip/,, /prp/, /pep/, /paep/, /pap/, /pop/, /pAp/, /pup/, and /pup/. 
Each talker contributed three test syllables. Vowels wer^ randomly assigned to 
talkers with the constraint that -each talker contributed three different vowels, 
only one of which was a point vowel (/i/, /a/, or /u/) . Thus, the fivfe tokens 
of each syllable type were spoken by different talkers. In addition to thTfee 
test syllables, each talker produced two sets of 'precursors: /hi, ha, hu/ and 
/hi, hae, hA/. The syllables in each triplet were read at a' rat^ of one per ^ 
second. No attempt was made to cont rol th e injtonation pattern of the three- 
syllable utterance. ^ 

The 45 recorded syllables -for the Mixed Talker test were arranged in a 
random presentation order with the constraints that (1) the same intended vowel 
did no-t appear more than twice consecutively, and (2) tokens produced by the 
same talker were separated by not less than 8 tokens. A 4-sec interval was in- 
serted between tokens,. and a 10-sec interval was inserted after each block of 
15 tokens. ' ' . 

The Point-Vowel Precursor test was constructed by in^serting copies of ^each 
talker's point-vow^l triplet in front of the appropriate three test syllables in 
a copy of the Mixed Talker test. In each, case a 1-sec interval was inserted be- ^ 
tween the offset of the^final precursor syllable and the test syllable. 

The Central-Vowel Precursor test was constructed using each talker's cen- 
tral-vowel triplet, according to the same procedures. Thus, all Jthree Mixed ' . 
Talker tests contained identical test syllables; the order of presentation, the 
intensity levels, and t^a int^rtrial intervals were all the same. 

For" the Segregated Talker tQSt, one representative man, one wotnan, and one 
child were selected - from the full panel of talkers.^ For each component test (Man, ^ 



The man, woman, *nd child chosen as "representative** were individuals In each 
group of talkers whose test syllables produced ^a close- to-average number of 

73 

77 ' ^ 



Woman, Child) the talker produced the- full series of 45 test syllables, five 
, different tokens of each of the nine syllable tjrpes. The 45 tokens were 
arranged in the same order as in the Mixed Talker test.^ 

2. Procedure . Tests were presented to small .groups of subjects under the 
— s-ame listening conditions as in Experiment I. Subjects responded on score 

sheets that contained nine response alternatives in each row; "pip, pup, pap, 
peep, pop, pep, poop, pawp, puup." The experimenter pronounced each word, 
"drawing special attention to the last word, "puup,'' which stood for the syllable 
/pup/. The three Mixed Talker tests were presented to independent groups of 
subjects. Subjects completed two repetitions of the 45 test trials, for a total 
of 90 judgments per subject, 10 on each intended vowel. Three additional groups. i 
of subjects listened to the Segregated Talker tests; each group completed all thtee 
tests: Man (M) , Woman (W) , and Child (C). The order of presentation of the 
tests was counterbalanced" across groups in the orders: MWC, WCM, and CMW. For 
each group of subjects, data frx>m only the first two tests were analyzed. Thus/ 
phe total number of judgments fc^ *the Segregated Talker condition was equal to 
that for each Mixed :ralker condition (90 'judgments pet subject) and any effects 
of. fatigue or task familiarity were equally distributed across the three talkers 
in the Segregated Talker tests. 

3. Subjects . The listeners were 79^ paid volunteers from undergraduate 
psychology classes at the University of Minnesota. All were native speakers of 
English and most were native to the upper midwest region of the United States. 
In mixed talker conditions, 19 subjects heard" the Mixed Talker test, 15 heard 
the Poin^-Vowel Precursor test, and 12 heard the Central-Vowel Precursor test. 
The remaining"^33 subjects served in the Segregated Talker condition; 11 subjects 
heard each of the counterbalanced orders. 

Results and Discussion 

T 

In the Mixed Talker condition (without precursors), subjects made an aver- 
age of 17.0 percent errors in identifying vowels produced by^he panel of ran- 
domly ordered talkers, while in the Segregated Talker condition, listeners averaged 
9.5 percent errors for the vowels of the three single talkers. [The mean error 
rates for the individual tests were 9.8 percent (Man), 6.8 percent (J^oman) , and 
11.8 percent (Child).] Familiarity with a talker's voice significantly -improved 
the accuracy of identification [t(50) « 5.14, £ < .01]. Even so, this factor 
•accounts for less than half of the errors in the >lixed Talker condition. 

There, are two ways to look at the error percentages for /p-p/ syllables. First, 
on the Segregated Talker test, 9.5 percent is a relatively high error rate, con- ^ 
sidering the complete predictability from trial to trial to both the taljcer's 



^rors on the Mixed Talker test, and who were available for .further recording 
sessions . ' ' * 

^Acoustic measurements of vowels in the Mixed and Segregated Talker tests are 
reported in a companion study (Strange, Verbrugge, Shankweiler, and Edman, in 
press) . Average formant frequency and relative dur^^tlon values were comparable 
to those reported by Peterson and Barney (1952), Peterson and Lehiste ,(1960), 
and Steyens and House (1963) . 

- 74 ^ 



voice and the consonaptal frame. There are sources of vowel ambiguity not 
attributable to uncertainties in calibration. Second, on the Mixed Talker test, 
17 percent is a relatively 'low error rate, given that each judgment is made with 
no familiarity with the voice and without the benefit of sentence context. This 
erroB^.rate is not substantially greater than the overall 12.9 percent rate found 
for /h-d/ syllables in a similar mixed talker l:est (No-Precursor condition. 
Experiment I), though several changes were made that were* intended to increase , 
errors.^ There is clearly a great deal of information within a single syllable 



that specifies the, identity of its vowel nucleus. 



f 



The data for the Mixed a^id Segregated Talker conditions challenge the assumption 
that extended familiarization with a vowel space is the primary factor control- 
ling vowel identification. Ever\ so, some information must be avai lable in a ^ 
series of utterances from a single talker, since listeners correttly identified 
more vowels in the Segregated Talker test ^than in the Mixed Talker test. A vbwel-by~ 
vowel analysis of subjects' errors indicates that this improvement was not dis- , 
tributed evenly among the nine vowels. The first two columns in Table 2 present 
the error rate for each intended vowel in the Mixed and Segregated Talker conditions. 
Three of the vowels /i, i, u/ showed little change, since almost all tokens' were 
correctly identified in both conditions. Of the six Relatively ambiguous vowels, 
only /q/ failed to show improvement, while familiarization aided perception of 
/e, ae, o, a, u/. (Confusion matrices for these two conditions are preseYited in 
Tables A-3 and A-4.) , ^ 

\ : r H* 



TABLE 



2: Mean percent error in identification of citation-form /p-p/ syllables, 



Condition 



-r- 



Intended 
vowel 

i 

£ 

ae 

Q 

o 

A 

u 
u 



Mixed 
talker. 



Segregated 
talker 



P<^int-vowel 
precursor 



Central-vowel 
precursor 



1.1 




0.3/ 


• 3.3 




3.3 




1.6 






2.7 




.1.7 




26.8 




^.1 


4.7 


] 


10.8 




18.9 




1.8 


'20.7 




. 18.3 




20.0 


X-iti^o) 


22.7 (3.9) 


43.3 


(26.7)' 


29.2 


(12.3) 


T}^ 


(3.2) 


18.5 (1.8) 


18.7 


(12. 7T 


13.3 


(2.5) 


15.3 




7.6 


9.3 




22.5 




38.9 




17.6' 


26.7 




29.2 




2.6 




0.9 


7.3 




" 5.8 




17.0 


(13.2) 


9.X5.5) 


'15.2 


(12.7) 


14.9 


(11.9) 



As in Experiment I, it is' important to isolate the ^ntribution of response 
biases and to discover whether any of" the changes in vowel similarity reflect 
factors other than chance ^variation. Again, both a graphic analysis and the 



^The shift to a /p-p/ consonantal frame apparently h^d little effect on the error 
rate Tor the nine vowels studied here. Errors oriythose nine vowels averaged 
17.4 percent in /h-d/ syllables (with ^5 response^ alternatives) , compared to ^ 
17.0 percent in /p-p/ syllables.. ' " */ 75 * 



ERIC 



79 



Luce choice model were applied to the data from the Segregated and Mixed Talker 
conditions. Th^ first analysis (presented in Figure 2) showed "true improve- 
ment" in the identification of /c/, /a/,, /a/, /u/, and /u/ in the Segregated Talke 
condition. The apparent improvement for /o/ was associated with a large posi- 
tive bias, .while /a/ showed a negative bias. The Luce similarity analysis 
showed significantly reduced confusions between the following pairs: /e-ae/, . 
/q-a/^ /a-u/, and /u-u/. These four confushble pairs were major sources of 
error for the five vowels showing true improvement. Thus, the increases in cotr 
rect identification for these vowels reflect more than chance variation. They 
represent genuine compensation for confusions due to talker variation. • 



Figure 2i 



LU 
O 

LU 
0. 



3- 



< 



UJ z 

O LU 

2 Q 

Ul — 

CE LU 

LU CO 
Li, 

U- < 
5^ 



• 




Positive Bias 






0 






V _ „, 

i 




I 


u . ► 




" a 


. ■ ^ae 


Negative Bias 
\ J 




"True" 
^ Improvement 

* ■ ' r ' 



-10 



10 



20 



DIFFERENCE IN PERCENT ^ 
jCORRECT IDENTIFICATION 

Changes in correct and false identification attributable to keeping 
the talker constant throughout a test (citation-form /p-p/ sylla- 
bles). -Each axis plots the difference between the Segregated Talker^ 
condition 'and the Mixed Talker condition. 



. The failure tp find true improvement for either /a/ or /pV or a significant 
decrease in their pairwise confusion reflects their somewhat ambiguous status in 
upper mldwestern dialects. On the average, errors, for /a/ and /o/ were almost 
as frequent for a single talker as they were for ,a mixed group of t^alkers. 
Thus, the similarity of /a/ and /o/ is apparently a function of the dialect, n5t 
of unfamiliarity with talkers' voices. ' ^ 

The kind of improvement resuTting from familiarization with a talker's 
vowel space may be summarized as follows: overall errors drop somewhat (7.5 

76 ^ • 



80 



ERIC 



percent in this experiment) , genuine overall improvement is found for several 
ambiguous vowels, and there is a significant decrease in similarity for several 
vowel pairs. If the point vowels specify efficiently the kind of information 
gained during extended familiarization, we would expect a similar pattern of ±mrf 
provement in the Point-Vowel Precursor condition. ^ 

The results did not support this hypothesis. Exposure to a talker's point 
vowels aided listeners only sj.ightly, reducing overall eri'ors from 17.0 to 15.2 
percent; the difference was not statistically significant U(32) = 0,97). In the 
Cent jral- Vowel Precursor condition, overall errors also dropped slightly, to 
14.9 percent, though again the change was not significant tt,(29) = 1.21]. In 
other words ,^not only was there no evidence fox a gain attributable to point 
vowels, but there was Ho difference between the point vowels and a set of non- 
point vowels. In general, experience with specific sets of vowels seems to make, 
little contribution to the total reduction of errors attributable to prior ex- 
perience with' a person's voice. * , ' ♦ 

It is important,, to determine whether these conclusions are affected by the 
resiflts for individual vowels'. The right-hand columns in Table 2 present ^the 
errors on each intended vowel followiiTg' point-vowel and central-vowel precur- 
sors. (Confusion matrices for these conditions a*e presented in Tables A-5 and 
A-6.) A comparison of errors in the Point-Vowel Precursor condition and the 
Mixed ^Talker condition (without precursors) is presented in Figure 3. In gener- 
al, the point vowels did not produce a "true improvement" in the perception of • 
amb-iguous vowels like that found -in the Segregated Talker condition. \^ere similar 
apparent improvement3 were found, they tended to be associated with much higher 
relative levels of false identification in the Point-Vowel Precursor condition 
(compare Figures 2 a^d 3). In other cases, apparent improvements found for the 
Segregated Talker condition were not found with the point-vowel precursors. A Luce 
analysis indicated that the only comparable change in pairwise similarities was 
a substantial reduation in /e-ae/ confusions in both conditions. None of the 
other reductions found with segregated talkers were found with p'bint-vowel precjir- 
sors. In addition, the /o-a/ .confusion, which showed no change with segregated 
talkers, showed a sharp increase in the Point-Vowel Precursor condition. 

When the Central-^Vowel Precursor condition was compared to the Mixed Talker 
condition on a vowel-by-voyel basis, virtually the same results Were obtained. 
No vowel showed more than a marginal change in the direction of true improve- 
ment^ and a significant decrease in pairwise similarity was observed for /e-ae/^ 
However, the increase in the /o-a/ confusion observed with point-vowel precur7 
sors was not observed here. Thus, to the limited extent, that improvements ai 
found at all with precursors, there is no evidence that the three point vowe^ls 
are unique as sources of information about a talker's vowel space. 

In general, however, neither set of vowel precursors were efficient car- 
riers of the kind of infot^tlon available in extended experience with a talk- 
er's voice. "Sets of vowels of known identity did not {>roduce reductions in 
overall errors, errors on specif ic , vowels, or pairwise similarities comparable 
to those produced by extended experience. ^ 

An extension of the Luce model allows one to make comparisons between the 
overall error patterns for two experimental conditions. , Specifically, one may 
ask whether the same set of stimulus similarity and response bias parameters is 
sufficient to describe both patterns, or whether different sets provide a closer 

77 



81. 



/ 



o o 
°^ b 

lU < 
Q. O 

■ Z LL 

- lI 0 

cr 111 

\!^^ 

LJ. < 



LL 



-2- 



73 







Po^tive Bias 


• 




A 






if 

e ' 

' ^ ' ^ e 




* 


, C N ^ 


f * 


u I 




Negative Bias 
a 

1 1 


1 

SB' 


True 

Improvement 

i 1 



-1- 



. -20 -10 0 10 20 . 

, DIFFERENCE IN PERCENT . 
CORRECT IDENTIFICATION 

Figure 3: Changes in correct and false identification attributable to /hi, ha, 
hu/ precursors ^(citation-form /p-p/ syllables). Each axis plots the 
difference between the Point-Vowel Precursor condition and the Mixed 
Talker condition. *. 



fit. In the latter case> one may test models in which only the similarity 
parameters for each condition differ,^'in which only the bias parameters are dif- 
ferent, or in which both parameter setstdiffer. 

Joint 4iodels for the Mi?ced anA Segregated Talker conditions suggest that the 
dominant impact of extended familiarization is on perceptual similarity* The dif- 
ferent-parameters model (x^/df « 3.'5^4), in which both sets differ, provides a 
closer fit than the same-parameters model (x^7df =» 5. 43). 7 This improvement is 

^For ease of comparison, the goodness-o£-f it for each model has been character- r 
ized by* the ratio of the maximum-likelihood value to the numbei^ of degrees of 
freedom. Most of the values are significant, and the Luce mod€?3Bs* appear , to 
be rejected. However^ these significance tests assume that the observed fre- 
quencies manifest stable population probabilities* Analysis of the variajjility 
among subjects revealed significant heterogeneity in their response^ to several 
vowel categories. Thus, the reported, values reflect substantial heterogene- 
ity among a^bjfects, as w^ll as deviations of the expected values from underlying 
populatio/ values. When ^just^ents axj^ made for the observed heterogeneity, the 
fit of tne Luce models i>s much improved. The unadjusted ratios provide a use- 
ful meagre for present purposes, since the degree of heterogeneity was roughly 
cofistant across the experimental conditions" Jjeing compared^ ^ 

78 



'Contributed largely. by different similarity parameters: the different-similar- ' 
ities njodel (x^/df = 3.83) fits both conditions more sucessfully' than the dif- * 
ferent-biases^ipodel (xVdf = 5.27). This means that the .main effect of hearing 
-» a single talker, ^s on a listener's ability to discriminate the vowels themselves, 
not on the listener '"s response biases. • ^ 

: . . */ A 

A different result is found when the Mixed Talker condition is compared 
with each precursor conditiorv. In each case, estimating different .similarity 
parameters fails to improve the au^r^ll goodfless-of-fit j different bias param- 
eters, on the ptTier hand, do iifti}»ve the nyDdel." When "errprs in 6he Mixed Talker, 
and Point- Vowel Pre^rsor conditions are jointly njodeled, the same-parameters' 
model (x^/df = 3^.68) fits substantially better than the (*if ferent-sipilarities 
"model (x^/df^= 5.78)', but not as well as the different-biases model (x^/df = 
.2.46).^ Similarly, when errors in the Mixed Talker and Central*- Vowel Preci^rsor 
conditions are jointly modeled, the s^e-parameters model (x^/df = 2.66) is not 
improved by^ the\ addition of different similarity parameters (x^/df = 4.55), but 
is improved by tlifferent bias parameters (x /df = 2.35). Thus, the precursors 
' not only produced a pattern of similarity changes different from that hypothe-* 
•sized, but produced change bfe a 'different kind altogether. Precursors predom- 
. inantly af^cted listeners' iprefei^encets for ^various response altetnatives , 
^rather than their ability to^ distinguish among intended vowels. . ' - 

^ . ' . ^' ""^ 

• A possible shortcoming of the design of this, experiment is. that the test syl- 
lables were not sufficiently "natural": since 'they were spoken 'in citation form, 
the formant frequencies of their vodalic centers would not 'show the degree of 
variajb^ilipy found for des tressed vowels in rapidly articulated sentences. It 
is possible that the t^sk of perceiving rapidly spoken syllables' places a higher 
^ premium on information about the vocal tract. Experiment III was ^designed to 
' determine whether point vqwels would benefit listeners on a mixed talker /task * , 
invol'^ing rapidly articulated vpwels. , \. 

' • . ' ■. . • • ^ X 

■ ' EXPERIMENT III: PERCEPTION OF VOWELS . IN. DESTRESSEO /p-p/ -SYLLABLES 

In the rapidly articulated 'syllables of connected speech*, vowel durations 
tend to be short and .vowel fontiants are not likely to reach steady-stape values. 
F6rmapt values at the center of syllables ifi connected "speech are different 
^om th<pse found in single "Syllables spoken/ln citation form, the, degree of , 
deviation depending. systemat^ically on the^rate of articulation apd- the amount 
of destressing '(Tiffany, 1959; Shearme and Holmes, 1^62; .Lindblom^ 1963; Gay,'. 
1974)a If vowel- perception involves relating vowels to^a "^pace" (defi^ned by 
sopue t^ansformatiori pn dormant frequencies), then the frequency variation con- 
tributd^ by speaking rate. should considerably enhance a listener .difficulty • in ^ 
calibrating to a talker's s^w^^ This experiment explores* tjhe perceptAj^l proh-. 
lera posed when bot;h .talker-a^endent and rate-Jepend^nt vari^ation are present. 
The error rate for single, rapidly articulated syllables excised from carrier , 
. sentences s1iouid.be substanfeially\greater than that f&urtd for syllable^, spoken 
*in isolation. Given the ^(pre^sumably)- more difficult t^sk of'/ident j^fying a rapi^d, 
deslressed syllable, information about a talker's point vowj?,ls may play atlarger* 
role'^than was found in preceding experiments. *\ 

IThe experiment consisted bf three test conditions. In the No^Ptecursor 
condition, listeners heard a mixed talker test containing /p-p/ ^llables-'spolcen. .( 
by tiie same panel of talkers used in Experiment II. The syllables^ were Spoken 

< • ^ « 

" • - , ' J 79, 



A. 



ERIC ' / ' \ U ' , . /. ^,^3 . ■ • h 



in destressed position in the^context of a full -carrier sentence and were ex- ^ 
cised for use in the test. .In the Point- Vowel Precursor condition, each test 
syllable was preceded by a point-vowel ptecursor ^string spoken by' the appropri- 
ate talked". In the Sentence Context condition/ each 'test syllable' was heard in 
the flte^^xt of the carriep sentence in which it was originally produced^ One 
woulg^xpect the error rate in this condition to be lower thait that in the No- 
Precursor (and no context) condition, since morfe i^iformation is availab^l about 
the* talkers prior to the test syllables. If so;, the degree of improvement pro- 
vides a mo^sure of the information supplied by sentence context, when no seman- 
tic factors are involved. The pattern of .improvement^ following point-vowel pre- 
cursors should be similar, if the predominant effect of feofch types 'of context 
(precursor and sentence) is to' allow calibration to a .talke,r ' s* vowel space. 

Method ^ ' 

' * ' * 

If Stimulus materials . Each of the 15 talkers contributed the same three 
syllables they had produced for the mixed talker tests in Experiment II. In all 
three conditions of this experiment, the order ^of talkers an4 test syllables was 
the same as in the earlier jexperiment.' The tests coi^italned five tokens of each 
.of nine /p-p/ wordsv^ach. of the five tokens was produced by a different talker 
and elich balker contributed only one point vowel. The test syllables were 
spoken in the following carrier sentence: "The little p-p's chair i^*red." 
Talkers were instructed to read each sentence rapidly/ stressing the word 
"chair." , - • 

The test syllables were excised from copies 'of the carrier sentences for 
use^ in the Nb-frecursor ^and Point-Vowel Precursor tests. Each recording was 
monitored and the audio tape was cut within the silent interval just preceding 
the release burst of the initial /p/ and during the silent closu're interval of 
the final /p/. Tijus, thcv final /p/ of the test syllables did not include a re- 
lease firom closure. To produce the No-Precursor test, th^ 45 excised syllables 
were assembled^ in the j^resenf ation order and then rerecorded as in Experiment II 
JThe Point-Vowel Precursor .test was constructed by inserting copies of ,each 
talker's point-vowel triplet in front *of the appropriate three test syllables in 
a copy of the No-Precursor test, using the same^ precursor strings and recording 
procedure as, in Experiment, II. , Thus, -the No-Precur.sor and Point-Vowel Precursor 
tests contained identical test syllables, with the same order qf presentation, 
intensity levels, and intertrial intervals, and each \^as comparable in these re- 
spects to the mi:^ed talker conditions of E xperiment XI>^__Ihe-^ent£nce .Context 
test was *construc4:ed using copies ^pf-^tKe'original carrier sentences. The order 
of talkers an.d component test syllables was the same as th^t in the other two 
tests. A 4-sec interval was inserted between each sentence. 

2. Procedure . * Tests were presentedfto sm^l groups of subjects under the 
same conditions as in previous experiments. Subjects in the Sentence Context 
condi1:ion were told that each, test j^ord wpuld be spoken in the {middle of the 
same sentence: "The little (Something) ' s chair is red." The three tests were 
presented to independent groups of subjects. Subjects completed two repetitions 
of the 45 test trials, j^or a total of 90 judgments per suljject, 10 on each in- 
tended vowel. ^ ' ss ^ ^ , 



3. Subjects- . The listeners were 52 paid volunteers ^ from undergraduate 
psychology classes at the University of Minnesotas. All were native speak,^rs of 



80 



Jgnglish and most were native to the upper midwest jregion. Twenty were subjects 
in the No-Precursor condition, 17 in the Point-Vowel Precursor condition, and 15 
in the Sentence Context condition. ' ' . • . 

Results and Discussion 

' Listeners averaged 23.8 percent errors in identifying the vowels in the ^ex- 
cised syllables wi-thout precursors. As expected, this error is" higher than 
the 17.0 percent rate found for citation-form syllables in the^comparable Mixed 
Talker test in Experiment t^4^he difference between these two conditions is 
significant [^(37) = 3.88, £ < .01] . 

Given the increased ambiguity when both talker- and ^rate-dependent varia- 
tion are present:, it might be expected that listeners would make greater use of 
^d. talker's point vowels to^ reduce that ambiguity. Contrary to this'expeotation, 
f:he average error rate in the Point-Vowel Precursor condition was 28.6 percent, 
which is significantly higher than the 23.8 percent rate found when no precur- 
sors are present [t^(35) = 2.85, £ < .01]. This is a, startling result: it does 
not fulfill the expectation that greater improvement would be found where more 
was needed, nor does ill even replicate the minor ^improvements found with point- 
vowel precursors in Expieriments I and II. 

In contrast to these results for point-vowel precursors, a substantial 
decrease in errdrs Was ^ found, when the test syllables were heard in their origin- 
al sentence context. Listeners made an average of 17.3 percent errors in the 
Sentence Context condition; this is significantly lower than the 23.8 percent 
error rate found for the test syllables in excised form [t^C33) - 3.31, £ < 101]. 
Thus, a carrier sentence contains information that makes vowels in' cbmponejit 
syllables less ambiguous. 

Error rates for individual vowels are presented in Table 3 for each of the 
three test conditions. A comparison of' the results for excised syllables 
(first column, Table 3) and for citation-form syllables (first column, Table 2) 
suggests that listeners in the No-Precursor condition may not have accommodated 
completely to the rapid pace at -which the excised syllables were spoken. In 
general, errors on these syllabp.es were in the direction of hearing vowels, in 
the periphery of two-formant dp'ace as more "centralized** or "reduced" (cf. con- 
fusion, matrix. Table A-7) . (1) Two point vowels^ /i/' and /u/,^which produced 
very few errors in citation-form syllables, were somewhat ambiguous in the de- 
stressed syllables. The errors on /i/ generally involved misperceiving it as 
/i/. The" vowel /u/ tended to be misperceived as /u/. (2) Errors more than 
doubled fen /a/ and /o/. 'By far the most common errpr on both /a/ and /o/ was to 
perceive them /a/. As a consequence, /a/ showed a large increase in false 
identification., (3) The^^ vowels /ae/ and Va/ were also more amb4guous in de- 
stressed syllables, were most frequently misperceived as, /z/ and /u/, respec 
tively. (A). In excepOOj^ to this general paitern of increased error rates, the 
vowels /e/ and /u/ showed substantially fewer .errors in destfessed syllables. 
However, both vowels wer^ popular false responses, and the apparent improvement 
was associated with i positive bias in each case. ' It is reiev^nt^ that /g/ aitd ^ 
/u/ are the most "cehtral" vowels in two-formant space, irt that they ^re inter- 
mediate in first-formant frequency^and therefore reduction- toward schva do^s not 
tend to produce formant combinations typical of other vowels. The tendency for 
listeners to select moi;e "central" .vowel responses Suggests that *they underesti-^ 
mated the tempo at »yhichN;he excised syllables were spoken. * , . ^ 



TABLE 3: Mean percent error in identification >of destressed /p-p/ syllables. 







Condition 


Intended v 




Point-vowel 


vowei. 


rio—precursor 


precursor 


• 

i 


11.5 


" 11.2 




I. - 


> '0.5 


■ 1-.8 




c 


7.9 


3.5 






24.5 * 


• 44.1 




a 


62.5' (43.0) 


95.9 


(92.4) 


o 


49.5 (25.5) 


5Q.6 


(45.9) 


A 


33.0 


2'7.6 




U 


19.0 


L8.2 




U 


4.5 


^4.7 




Overall 


23.8 (18.9) 


28.6 


(27.7) 



Sentence 
context 

6.7 

0.7 
20". 0 

2.0 
36.7 
31.3 
33.3 
23.3 

1.3 



(12.7) 
(4.0) 



17.3 (-11.6) 



Rather than enabling listeners to compensate for errors introduced by tempo 
uncertainty, the point-vowel precursors served only to increase the errors (se.e 
Table 3 and the confusion matrix in X^ble A-S) . Listeners tended tohear vowels 
more centralized th^n those intended, and did so with even greater frequency 
than in the No-Precursor condition. * The trend was so strong for /a/ and /o/ 
that confusions between tfliem accounted for only 6 percent of errors on the two 
vowels .themselves and oply 3 percent of all errors on the Point-Vowel Precursor 
fest. Relatively low error 'r^tes occurred on the two most "central". vowels , 
/e/ and /u/, as was found on the No-Precursor test. 

It se^ms^ likely that 'the ^precursor .syllables (spoken in citation form) 
established an expected tempo inappropriate for perception of the subsequent 
test 'syllables. ^ Instead of , calibi^ating listeners to the formant' ranges ol a > 
talker's vowel space, tl;e prjecursors ca'librated listeners to the tempo of the 
talker's speech. ' If the^ test syllable had tVuly been spoken in isolation with 
a stress equal to tha.t of the precursors, the prior adjustment to talker tempo 
would have been appropriate. This condition was met in the Point-Vowel Precur- 
sor test of Experiment II, where errors averaged only 15 percent. However, the 
comparable test in Experiment III juxtaposed' syllables spoken with radically 
different rates lant^ stresses, and the contrast prbduced a large increase in 
erroneous judgiaents. ,As in the No-Precursor condition^ the pattern of errofs 
reflected the cpntraction of acoustic vowel space found for rapid, destressed 
speech (cf. LindbL^fe, 1963). \ 

^ \Xi contrast to the results following precursors, error rates ^ for ii\divldual 
vowels dropped when the Sestress^d test syllables were heard in- sentence context 
(siefi Table 3 !and the confusion matrix in Table A-9) . Error rates for /i/, /se/, , 
/q/j /d/> and /u/ were all lower ia the Sentence Context condition thap in tl}e 
No-Precursor condition, where the syllaJll^s were heard, ii\ 4polation. ^ile er- 
rors on Itl and /u/ were rel^atively infrequent; in the excised syllables, they 
Increased when heard in sentence context. In general, the pattern o.f changes' ^ 
was complenftentary to that observed for the excised syllables. The marked "cen- ^ 
tralization" of vowel responses disappeared when syllables were h^ard .in sentenote 
Qontext. * < . * 



These results suggest that a carrier sentence aids identification of vowel 
targets, by 'allowing listenerb to adjust to talker tempo, rather than by allowing, 
them to compensate for talker variation. The observed changes in identification 
)iave l^btle in common with 'those found after extended familiarization with a 
tialker'p speech (cf. Figure 20. When .errors in the Sentence Context' and No-Pre- 
cursor conditions were compared, there were no vowels that showed "true improve- 
ment" in identification. The main effect of sentence context wa6 to reverse a 
pattern of positive biases toward Izl and /u/~and to a lesser extent III and 
/a/ — a patteni that has more to do with tempo uncertainty than with talker vari- 
ation. - 

Luce analyses for the three experimental conditions corroborate the conclu- 
sions drawn from the les,s formal error analyses. Most pai?wise confusions were 
greater for destressed syllables (No-Precursor condition) than for citation-form 
syllables (Mixed-Talker condition. Experiment II). In two cases, /a-o/ and 
/o-a/, the increases were/large and significant. Thus, tempo uncertainty pro- 
duced some genla^ increases in vowei coi^f usability. However, one. significant 
decreelil^ was ^l^cj, observed: the /e-se/ confusion, largest source of errors on- 
citation-form syllables, was substantially smaller for. rapid, destressed sylla- 
bles. It is possible that rapid articulation produced tokens of Izl that; would 
also have been produced with high probability in citation form — that is, rapid 
articulation may affect Izl more by reducing its acoustic variance than by shift- 
ing its typical formant composition. If this effect were large enough, the 
ovferall discriminability of Izl and /ae/ would increase, as observed. 

Pairwise confusions for the Point- Vowel Precursor condition showed little 
systematic change relative, to the No-Precursor condition. The only significant 
change was an increase in the confusability of fal and /a/. The /e-ae/ confu- 
sion \*as more Asymmetric than in the No-Precursor condition {Izl was never per- 
ceived as /ae/ following precursors), and the similarity showed a further, though 
nonsignificant decrease. * 

Pairwise confusions in the Sentence Context condition' tended to be lower 
than in the No-Precur^or condition^, though only one of the decreases (/o-a/) was 
significant. Thus, sentence context reversed one of the two significant^n- ^ " . 
creases in confusability found for the excised syllables. The otjier vow^l pair 
/a-o/ also showed a reversal, but the decrease was not significant. 

While the observed chianges in pairwise similarities were usually in the . 
expected direction, they were also f5w in numlp^r. The predominant effect of 
misperceiving tempo was not a change in vowel similarities, but an error-produc- 
ing shift in response biases. . Joint Luce models for thia citation-form syllables 
(Mixe^ Talker condition. Experiment II) and destressedf syllables (No-Precursor, 
condition) verify that the main impact of tempo uncertainty was on response 
b:|.ases. A same-parametfers modeix/x'^/df = 6.14) was not improved by different 
yimilarity parameters (x^/df 7u36), but w^s substantially improved by differ- 
ent biases (x^/df 3.86). Joint Luce models comparing the d.estressed syllables 
in isolation (No-Precursor, condition) ^with those in sentence context jrield simi- 
lar results': a same-parameters mbdel (x^ldi = 4'.18)'W^s not improved by differ- 
ent similarities (x^Vdf = 6.58), but was improved by different biases (x = 
2.27). Again, these results for the Sentence Context condition contract sharply 
with those for the Segregated Talker test (Experiment 11), where the ^'predominant 
e'flEect was on pairwise sinfilarities, not biases. • * ^ 



. It is Interesting to npte that the error rate for syllable-medial vowels in 
sentence context (17.3 percent) was very close to that for medial vowels in 
citation-form syllables (17*0 percent)*; 'the difference was not significant 
U(32) = 0.16]. This suggests that there J,s a very stable level of error for vow- 
els in /p-p/ words when heard in a unit of articulation sufficient to specify* 
te;npo. The only additional assumption required is that a syllable spoken in 
isolation specifies its own tempp. ^ ^ • , 

These results provide strong evidence that the perceptual system* adjusts to 
•the ongoing tempo of a talker's utterance. However, it remains an open question 
whether this adjustment involves transforming or calibrating a relational vowel 
space for individual talkers. No evidence for a talker-specific space of this 
kind was found in earlier experiments, nor was any found in the precursor condi- 
tion of this experiment. In addition, the effect of sentence context on identi- 
fication was veiry different fVom the effect 'of extended familiarization with in- 
dividual vocal tracts. Thus, this experiment provides no evidence that sentence 
context aids vowel identification by -allowing compensation for talker differ- 
ences. — ' , . » ' 

Little Js currently known about how formant contours are transformed lay 
variatiohs in speaking rate and. stress, or how listeners adjust to these changes 
Lindblom. (1963) has attempted to characterize the variation in. vowel center for- 
mant frequencies as a .function of speaking rate. Lindblom and Studdert-K6nnedy 
(1967), in turn^ have demonstrated that listeners are sensitive to these ^varia- 
tions when identifying vowels in isolated, synthetic syllables. If t;wo syllable 
reach the same formant frequency values at the syllable, centers, but simjiilate 
different rates of articulation, listeners adopt different criteria fo^r jldenti-* 
fication of the two medial .vowels. These preliminary efforts suggest that the 
formant transitions, „which are generally understood to carry consonantal infor- 
mation,^ must also aid in specifying the vowel . They apparently dp so, at least 
in part, \xy limiting the range of possible talker tempos. The Sentence Context 
condition of this experiment suggests that factors beyond the syllable also 
shape the acoustic specification of vowels and are therefore important to accu- 
rate identification. 'A major function of a carrier sentence is to specify the 
tempo and sWtess of component syllables.^ ^ a 

SUMMARY AND CONCLUSIONS ■ * , . ; 

'These experiments lead to the following conclusions about the perception of 
vowels in natural speech: ^ ' i " » 

1 _ • .4 

8 * * 

Gay'^ (1974) acpustic measurements suggest that the critical feature of ^e- 

stressed syllables in natural sentences is that they ^re destresse^'/ not that 

they are rapidly spoken. Point vowelq in ^^pidly spoken syllables did not 

show the reduction toward gchwa that is found in destressed speecl^ (Lindblom, ^ 

1963). It is not clear what implications, this has for t\\a perdepfual studies^ 

of LlA^blom and Studdert-Kenttedy (1967) or the studies * presentee} here. In both 

cases, tempo variation has provided a plausible basis for explanation. . Farther 

research is needed to determine whfether peifceived pace and syllable duration 

are secondary to petceived stress In determining the pattern of listeners' ^ 

identifications. 



1. Talker-dependent acoustic variation does not pose a major perceptual 
problem within a common dialect group. Listeners can identify a high proportion 
of vowels spoken in citation-form syllables by talkers with whom they have little 
or ho previous experience\ In Experiment .I| listeners identified 87 percent of . 

^ yh-d/ syllables spoken in random 'order by 30 taljcers representing the full 
natural range af acoustic variation'. In Experiment II, they identified 83 per- 
cent of /p-p/ syllables spoken by 15 talkers.^ Of the errors itfade in this Mixed 
Talker condition, no mdre than half cao be .attributed to talker-dependent sources 
of ambiguity. Correct identification in Segregated Talker tests averaged 90.5 
pg^<5ent .for vowels in /p-p/ syllables (Experiment II). ^ There was genuine im- 

''"Movement in the identification^ oi^ -specffic votrels, but on,ly a small portion of ' 
correct identifitation could be attributed t6 familiarization (the *dif f erence ^ 
between 83 and 90.5. percent) . Thus, experience with a voice, plays a secondary 
role in •specifying vowel identity. A single syllable contains subs'tantial in-^ 
formation -abqut- its medial vowel, whether a talker ' s ^vo ice is familiar or not. 

2. Contrary to the speculations of Joos (1948), Lieberman et al«. (1972), 
and ^Lieberman (1973), -the pelnt vowels do not* play a major'and privileged role 
as calibrators of a talker-specific vowel space. Experience with a talker's 
point vqwels does not significantly reduce the overall ambiguity of vowels in a 

•subsequent syllable. This result was found for all tHr^e types of test sylla- 
bles studied: /h~d/", citation-form /p-p/, and destressed /p-p/. The pattern of 
changes following point-vowel precursors did. not "resemble the pattern 'resulting 
from extended experience with a talker's voice (Experiment II). Extended* experi- 
ence produced cbnsistent i^eductions in pairwise similarities, while ^6xp'erience 
with a talker's point vo\^els mainly affected the pattern of response, biases, 
vith no consistent effects on vowel identif lability. ' Point vowejs did produce 
a significant decrease in the conf usability of /pep/, and /pa&p/, byt they were 
not unique in this respect: a signific^ant reduction was also found when test 
syllables, were preceded by central vowels^ (Experiment II) and when^tempo uncer- 
tainty was introduced (Experiment III). In general, there was little evidence 
that sample subsets of a talkerjs vowels enable listeners to adjust to the ^talk- 
er's idiosyncratic "space'* (defined by ranges of^ acoustic values or by sizes of 
vocal-tract cavities). This conclusion, lilce the first, does not support the 
proposal of Ladefoged^and Broadbent (1957) and Ladefoged <1967) that vowel per- 
ception can be regarded as a problem in establishing an adaptation level (cf . 
Shankweiler, Strange, and Verbrugge, in .press) . 

5. Listeners adjust their perceptual ^criteria for syllable-medial vowels 
according to the iJerdeived rate of art^.culation. When destressed /p-p/ sylla- 
bles^-w^re excised from sentence context and presented in isolatJLon (Experiment/ , 
III), there was a tendency to perceive them as if they had been spoken in citaf 
tion form: the pattern of errors showed insufficient compensation for the^ acous- 
tic effects of^ rapid articulation, \n\en citation-form precursor strings preceded 
tjie excised syllables, 'tlm^<!ontrast of expected and actual tempo's enhanced the 
original pattern of errms and increased the overall error rate^^When the ex- 
cised /syllables were heard in their original temporal environments (the carrier 
sentences), the pattern of errors reversed and the overall error^rate decreased.^ 
Carrier sentences apparent^J^y enabled listeners to adjust continuously to a 
talker's tempo and to compensate *for the acoustic effects of vowel reduction. 
Information aboyt a talker'a ongoing tempo produced a qualitatively different 
pattern pf improvement from that produced. by long-term familiarization with ci- 
tation-form s}|llables. This confirmed the*results of Experiment II (where cita-i.^ 
tion-.form testl words were heard irt the context of prior, tltation^form syllables) 



l6 W * . ' 

89- • 



in, the more natural situation of words in sentence context. * In neither case was 
there evidence that listeners acquired a scaling function for adjusting a talk- 
er's speech to a normative dialectal space. In contrast to the conclusions of • 
Ladefoged and Broadbent (1957), a naturally produced carrier sentence may aid 
vowel identifdcation more by establishing the tempo of speech than by delimiting 



an individuals vowel space. 

* How do listeners cope with talker-related acoustic variation? One possi- 
bility is that a single syllable^ (with consonants of known, identity) carries 
sufficient information for normalization to take place. Fourcin (1968) and 
Rand (1971) both h^ve demonstrated that . listeners adjust. their perceptual cri- 
teria for stop' consonants to compensate for talker-dependent variation in the 
consonants' acoustic structure. If the consonants in a. test syllable are known 
in advaftce, a single syllable could provide relatively unambiguous, information 
'*about »the talker's vocal tract. This ' information, in turn, could be used in 
disambiguating the vowel. 

A second possibility is -that- a talker-normalization procedure is not ' 
necessary tor human perception of vowels.' Vowel identity may be specif ied'byT 
properties of the acoustic sign§l that are relatively invariant across talkers 
und that do not require a prior talibration process to be ^accurately detected. 
The results for destressed syllables suggest that the dynamic properties of 
speech are especially critical: vowel identificati,on seems to be at least as 
sensitive to tempo variation as it is to variation in talkers' center forraant 
frequencies. Adjustment to talkers may have more to do with tracking the dy?iam- , 
ics of ongoing articulation than with nonSalization as traditionally defined. 

• . REFERENCES . " 

Abramson, A. S. and F. S. Cooper. (1959) Perception of American English vowels 
In terms of a reference system. Haskins Laboratories .Quarterly Progress 
Report QPR-32 -, Appendix 1. / " ' ^ . - 

Fourcin, A- J. (1968) ^Speech sburce inference. IEEE Trans. Aud^o Electro-. 

• acoust.. AU-16 , 65-67. ^ }' ^ ^^^^^^ -'^ 

Gay* Y. (1974)* A cinefluotographic study of vowel production^ .' Jv rii Og £ ftri;crs' jyi j,-^ 

2, 255-266. V • * ,X - ' 

Gerstman, L. H. (1968) classification of. self-normalized' vowels-. IEEE Trans'. ^ 
/ Audio Electroacoust. AU-16 , 78-;B0. . , ^ . ^ 

Goodman, L. A. (1969) How to ransack social mobility tables and other kinds of . 
' cr9ss-classification tablea* Am. J. Soclol* 75 , 1-40.* . . 

Goodman^ L. A. (1970) The multivariate analysis of qualitative data: .Interac- 
tions among jnultiple"" classifications. ' ^J. Am. Stat. Assoc. 65 , ^226-256.. - ^ 

Helson, H- ,(1948) Adaptation level as* a basis for a quantitative theory of 

frames bf reference'. Psychbl. ReV. 55, 2»7-31;3. " - ' ^ - ^ 

Joos, M. .A. (1948) Acoustic phonetics. Language , , Suppl. 24, J.-136.* " \ • * 

Ladefoged, P. (1967) Three Are^ of Experimental Phonetics ./ (Ne\/ York: 
Qxford University Press). ^ , < , . / *^ ♦ " ' 

Ladefoged, P. dnd'D. E. Broadbent\ (1957) Information conveyed by vqwels, , . ^ . 
J> AccMist.- Soc. Am. 29, 98-104. , % . / J * 

' Lifeberman, P. (1973) On the evolution of language: A uriified view. Cognition • 

2^ 59-94. ' ' ^ / ' ' ' ^ ^ . - ' 

1 { ' ' ' ^ - 



Liebertnan> P., E. S. Crelin, and D. H. Klatt. (1972) Phonetic ability and re- 
lated anatomy of the newborn, adult Human, Neanderthal man, and the chim- * 

panzee. Am. Anthropol. 74 , 287-307. ^ 

Lindblom, E. F.* (1963) Spectrographic study of vowel reduction. J. Acoust. 
Soc. Am. 35. 1773-1781. 



'Lindblom, B. E. F. and lir— Studdert-Kennedy. (1967) On the role of formant ^ 

transitions in vowel recognition. J. Acoust. Soc. ^Am. 42 , 830-843. 
Lindblom, B. E. F. and J. Sundberg. (1969) _A^ quantitative mod^ ot vi^^wel pro- 

duction and the distinctive features of Swedish vowels.' Quarterly ?rogre^s 

and Status Report (Speech Transmission Laboratory, Royal Institute df ^ 
Technology, Stockholm, Sweden) STL- QPSR 1 ,^14-52. 
, Luce, R. D. (1959) Indiv44tf^^l--Choice Behavior; A TheQ^At-f raU^^nalysis * 
(l^ew York: Wiley). ^ , 
Luce, R. D. - (1963) Detection and recognition. In Handbook pf Mathematical 
Psychology , ed'. by R. D. Lude, R, R. 3 us h, "and E* Galanter," (New^ York: 
Wiley), Vol. l,-ppT-±03-189. * ' - ' * / 

Peterson, G. E. (1961) Parameters of vowel quality. J. Speech Hearing Res. 

A> 10-29. _ 
Peterson, G. E. and H. L. Barney. (1952)^ Control fiiethods useTin a study of 
the vowets. J> Acoust. Soc. Am.' 24 ^ 175-184 
'Peterson, G. E. and I. Lehiste. (1960) Duraticm Vf^ syllable nuclei in English. 

ir J. Acoust.. Soc. Am. 32 , 693-703. ^ * ' 

Rand, T. C. (197^) , Vocal tract si^e normalization in the perception of stop 

consonants. Haskins Laboratories Status Report on Speech Research SR-25/26 , • 

141-146. ' \ ^ " — - , ■ ^- 

. Shankweiler, D. , W. Stra.n^e, and R. R.' Verbtugge. (in 'preiss) Speech and the 
problem of perceptual constancy. In Perceiving, Actinfe^ apd Knowing: 
' Toward "an Ecological Psychology , ed. by R. Shaw and J^^Bransf ord. 
(Hillsdale, N. J. :_ " I^wrence Erlbaum Assoc.).- 
Sheanne^T[^"^^,^^d^t?^N. 'Holmes. (1962) An experiirf^tal study of the classifi- 
^^j^^*^^^^'*^ckt±on of soi^^s in ^continuous speech according to their distribution in 

t>he forraant- I/- fdrmant 2 plan^. In Proceedings of the Fourth ~Internati"on- 
\ - al Congre&sr^'f Phonetic Sciences . ^ (The Hague: Mauton), pp. 234-240. 
\ Stevens, K. N. ytl972) ^ Tbe.,,quantal '^nature, of ^speech: Evidence* from articula- 
tory-acoiifitic, data." In Human Communication: A Unified ViiBw ,-ed. by^E. E. 
David »y<fr., an^ P.>B. Qene s, (NeW York: McGxaw-Tllll) , pp. 51-6;6^ 
Stevens^^^^. ^N. and A. S*. House.. (1963) Perturbation o"f vowe^l articulations hy 
^^^cTdnsonantal'conte^: An acoustical study.' J. Speec4i Hearing Res. 6, 111- 
128. . \ . \ . . . - . 

Strange/ W/, R.. R. Ve'rbru^ge, D. P. Shankweiler, apd T. R. Sdman* (in press)^ 

' Consonant environment -specif ies vowel , identity. J. Acoust. Soc. 
Tiffany, W. R. (1959) ^ Nonrandom sources ''of variation in vowel quality'. 'Jj^ 

' Speec.h Hearing Res. 2, 305-317. ' . . ' . ^ ' ^ \ ' 

Verbrugge, R. , W. Strangfe, and D. Shankweilel:. '(1974) WhSt iaf olrination enables 
a listener to Aap a talker Vs vowel space? Haskins Lgrb^y^CQgx^s Status 
V Report on Speech Research - SR-37/38 , 199-208.^^ , * . ' . 



87 



/ 



APPENDIX A; COtMISION MATRICES 

Tables report the frequency with which each intended vowel x was identified 
,as response alternative In addition, summary statistics for each condition 
are provided: the percent error for each intended vowel, the overall percent 
error for each repetition (rep.) of the test series, the overall percent error 
pooling both repetitions, the total number of trials for the_two repetitions, 
the mean number of trials on which listeners made an error (x) , the standard de- 
viation of this mean (s) , and the number of listeners (N)^ 



c u 

0) o 

o u 

u u 

0) 0) 



OrH'tHcnmiHr^r^rno^d-rNOcNiox 

rHOO>rslOOOO^^OOOCNCNrs|vOm 
CNiHiH^d-iHiHiH iH iH 



O" 



iH cN rn cn 



D 
a 



rH cN <r 



o 
o 



D 

o 



00 VO 



a\ m fH 



o 

CN 



00 



CN <r o <r <r 
fH fH 



fH Q CN 
00 VO 



CO 



CN ir> 00 <r 

fH O tH CN 



* fH «H tH fH 



<r a\ in 
CN 



0^ in -sT 
CO VO , 



rn CN 

VO fH 



in 



- \ 



CN 
O 
CN 



M 



93 




O^OC^4vOf^o^OOcnr^Or^^OOO^O 
OCTNO>o>cncNjcnoOrHOCNiN3-or^o 



o 



ON 



o 

CM 



ON 



CO 



O 
CM 



CM vO 

CO 



ON 



CM vO CN4 rH vO 

CM m m 

CM 



in m CM rH 
C7N m 



rH <r 



CM 



m vo rH 
CO in 



<r rH 

rH rH 
CM 



CM 



00 
rH 
CM 



CM 



C7N 
vO' 
rH 



o 

CM 



* ^ ' i »^ tD M D M 

•yHMwft}dO<DDVn(UO ' ' 



O 



TABLE A-3: Citationrf orm /p-p/ syllables: Mixed Talker condition. 



\ 



Intended 




« 




Response 




vowel 


i 


I t 


as 


a 


o 


A 


i " , * 


188 


1 










I 




187 1 






2 




t 




139 


47 


3 






as 




33 


154 




2 




a 








152 


19: 


17 


o 






1 


46 


138 


1 


A 








18 


5 


161 


U 




8 




2 




47 


H 












, 2 



-u Nohe 



Percent 
error 



2 
4 
6 



16- 
185 




1.1 
1.6 

26.8 
18.9 
. 20.0 
27.4 
15.3 
38.9 
.^2.6 



I ■ ' ' 

Overall percent error: 16.96 (pooled), 18.48 (rep. 1), 15.44 (rep. 2); 

90 trials, x ,= 15.26, s = 4.53, N = 19. 



TABLE A-4: Citation-form /p-p/ syllables:' Segregated Talker condition.' 



Intended 
vowel 



ae. 



Response 

Q Of A 



u None 



329 1 - - , ' . , o 

3 318 ■ 4 ■ « 2 2 
1 290 20 4 7 ' >5 
5 324. i 1 / 

7 255 . 62 4.2, 

55 269 ° 2 4 

11 -'' 9 305 * 4 

' / 29 19 272 10 

.1 2 327 

^ — 



Percent 
» ,error- 



^Overall percent err/r: ft-46 pooled), X0.57 (rep- 1)', 8:35 (rep.. 2); 

' , / /90 trials, x =^8.52, s = 4.77, N = 3^-^, ;^ 



0.3 
3.6 
12.1 

i:b. 
22.7 

18.5 

7.^ . 
17.6 
^0.9 

J rt-" 



91 



a. 



TABLE A-5: Citation-form /p-p/ syllables: Point-Vowel Precursor conditibn. 



Intended' 



Response 



Percent 



vowel 


i I e 




a o 


A 


u 


u Npne 




— i . 


145 ^ 5 












3.3 


• I 


146 3 










1 


2.7 


e ' 


1 J.43 


4 - 


1 


• 1 




/ 


4.7 


ae 


30 


119 




• 1 






20.7* 


a 




1 


85 25 


36 


■ 3 




p.3 ' 


0 




1 


•9 '122 


14 


4 




18.7 


A 




• 

4 


3 7 


136 


4 




9.3 


u 






2 


31' 


'110 




' 26.7 


u 






• 




11 


139 


7.3 



Overall percent error :^ 15.19 (pooled), 17.48 (rep.*l), 12.89 <rep. 2); 

90 trials, x « 13.67, s = 5.26, N = 15. 



TABLE >^-6: Citation-form /p-p/ syllables: j^entral-Vowel Precursor coh'dition. 



5 

Intended 
vowel 






Res{>onse 


A 




u None 


Percent 
^ error 


i . ^ 


116 3 


• 

* 










1 ■ 


3.3 




1 11'8 












. 1 


1.7 


c 




107 12 










1 


10.8 






22 . 98 












15.3 


a 






85 


20 


12 


J 




'29.2 J 


. 0 






13 


104 . 


1- 


1 


N 1 . ■ 


13.3 


A/ 






. 10 




"93 


9 




22.5' 


U ' 










24 


85 


5 


29.2 














' 7 


'113-. 


■ 5.'8: 



^Overal]^ percent error: 14.91 (pooled), 15.00 (rep. 1), i4.81 (r4p. ,2) ; \ 
V ,90. trials, x * 13.^,.^ = 3.78, U ^ 12. ^ • ' : 



92 



1. 



4, ' • . 



TABLE A-7: -Destressed /p-p/ syllables: No-Precursor condition. 



Intended 
vowel 


1 1 


e • 


as' 


.Response 
a 0 


A 


u 


t 

u 


None 


Percent 
error 


i 


177 • 16' 


6 












1 




11.5 


I* 


J.99 














1 




0.5 




2 


164' 


' 7 




1 


2 




2 




7.9 


s . 




48 


151* 




1 










24.5 


Q 








75 


39 


76 


10 






62.5 


3 






2 


48 


101 


43 


6 






49.5 


A 


8 


"5 




1 


15 


134 


35' 


1 


1 


33.0 


U 




1 




1 


2 


22 


162 


12 




19.0 


'u 












2 


7 


191 




4.5 



^Overall percent error: 23.84 (pooled), 25.19 (rep. 1), 22.49 (rep. 2); 

90 trials, x = 22.00, N = 9; 88 trials, x = 20.55, 
N = 11; pooled scores: x = 21.20, s = 4.98, N = 20. 

'^Two trials lost for 11 subjects. 



TABLE A-8: Destressed /p-p/ syllables: Point-Vowel Precursor condition. 

intended " ' ' ^^^^P""^^ ^ Percent 

vowel ^i I c ae.Q o a,u u None ' error 



151 6 1 . X 2 10 \ 11.2 

167 -3 ^ - r.8 I 

2 164 -3^1 • • 3.5 i 

IS) - 74 - 95 1 ■ ■ ^- 44.1 

q( - 1 2 ■ 1 7 6 151 2 • - 95.9 

P . 8 84 69 9' ' , . 50,6 ■ 

A ■ 1 3 1 1 13 123 25 3 ' 27'! 6 

u 4 1 11 139 15 18.2 

^1." ' 1 1 6 162 4.7 



^Overall percent error: 28.63 (pooled), 28.89 (rep. 1), 28.37 .(rep. '2); 

90 trials, x = 25.76, s = 4.70, N = 17. 



.93 



•97 



TABLE A-?: Des tressed //p-p/ syllables: Sentence Context coAdition.' 





^ % 




Response 








rercenti 


vowel 

^ 




Q 




A 


u 


|U iNone 


error 




? 140 10 












\ 


6.-7 


I 


149 . ' 












1 


, 0.7 


ae 


120 
2 


29 
147 




1 




1 




20.0 
2.0 






2 


95 


36 


15 


1 


1 


36.7 


D 






41 


103 


3 


3 




31.3 


•A 


\ 1 


1 


8 




lOQ 


24 




33.3 


U 






1 


4 


20 


115 


10 


23.3 


U 






1 






1 


148 • 


1.3. 



•Overall percent error: 17.26 (pooled), 18.22 (rep. 1), 16.30 (rep. *2) ; 

. 90 trl-als, x = 15.53, s = 5.08, N = 15. 



Identification of Dichotic Fusions* 
Bruno H.^ Repp^ 



ABSTRACT 

Seven synthetic syllables from a "place, continuum" (/bae - das - 
gge/) were presented in all dichotic combinations for identff ication. 
These syllables fused c6.tnpletely^ so that dichotic pairs were per- 
ceived as single stimuli. The response pattern could not be easily" 
explained by a^ "auditory averaging" hypothesis'. Rather, stimuli 
that were good instances of a category seemed to "dominate" stimuli 
that were closer to a category boundaAry. To account for this find- 
ing, a three-stage pattern recognition ("protot5i!;pe") model is pro- 
posed according to which the infofmaWon from the two ears is inte- 
grated after auditory but before phonetic-categorical processing, at 
a "multicg^tegorical" stage. Electronically mixed stimuli led to a 
similar response pattern, suggestWg that competing transitional cues 
remain intact up to the mu*lticateg6rical stage. It is demonstrated 
that these fusions cannot be reliably discriminated from binaural 
stimuli, and that selective attention to one ear has little effect. 
For the purpose of assessing ear advantages, dichotic fusions offer 
methodological advantages ove^onher dichotic stimuli. The problem 
of determining the "t^rue" ear adyantage is discussed. 

INTRODUCTION^ 

In. rece^nt years, dichotic listening has received much attention, both as a 
research tool for the investigation/of the processes involved *ln speech. percep- 
tion and as a diagnostic technique /for assessing hemispheric dominance for 



*A substantially revised version 

Journal of the Acoustical Societ ; 
' this research are urged to const 

+ \ . ■ . ■ 

Also University of Connecticut 



[ this paper is to be published in the 
of America .' Authors who wish to refer 
Lt th^ revised version. 



to 



lealtl;!^ Center, Farmington. 



Acknowledgment : This research/was conducted at Haskinp Laboratories 



-^d \ 



not have been possible witho 
tion an4 its <i^ector, Alvin 
James Cutting, Terry Halwes, 
sions related to this paper, 
to the University of Connect 



would 

ut^ the extraordinary hospitality of this ins tit u- 
ibennan. I thank him, Michael Studdert-Kennedy , 
ary Kuhn, and David Paul for comments and discus- 
The author was supported by NIH Grant T22 DE00202 
cut Health Center. 



(RASKINS LABORATORIES: StatJs Report on Speech Research SR-45/46 (1976J) 



95 



93 



Speech. Both aspects are addressed by this paper, which, Qn the basis of a 
detailed ^alysis of the dichotic interaction between the voiced' stop conso- 
nants, makes recommendations for a possible methodological refinement of dichotic 
testing. 

Dichotic teste composed of synthetic stop-consonant-vowel syllables have 
become widely accepted a^, the most precise instruments currently available for 
assessing ear advantages in speech perception (Shankweiler and St udder t-Kennedy, 
.1967a, 1975). The controt.qf stimulus characteristics and channel synchr6niza- 
t^on made possl;ble by moderp speech^ synthesizers and , specialized computer sys- 
tem^y together with the balanced stimi^lus set of the six stop consonants, gives 
these tests a distinct advantage over other materials and procedures. Neverthe- 
less, some problems remain. One is <he kind and number of i^ejsponses to be re- 
quired from the listeners: two responses (with or without restrictions on their 
order) or one response (with or without selective-attention instructions)? 
Variants of both response modes, have, been used at one time or another, but two- 
response paradigms have dominated thfe scene. However, because of the occurrence 
of confusions, intrusions, and guessing, and the lack of a good tKeory taking' 
these phenomena into account., the two responses cannot be unequivocally assigned 
to the stimuli *that evoked them, so that errors ,ai:id -'correct Responses are not 
clearly separated in scoring the results (cf. Repp, 1975a, 1976). Selective- 
attention ins true tiofts offer no remedy, since ^elective arttention is very diffi-* 
cult with precisely aligned dichotic syllables, and intrusions from the unattend- 
ed channel are common (Halwes*, lr969; Haggard, 1975; Repp, '1975a) . . * 

Another problem has been the derivation of an idex for the ear advantage. 
Simple percentage differences have the disadvantage that they depend on the 
overall performance level and therefore do not Adequately represent the degree 
of an ear' advantage but merely measure its direction. The proposal of Kuhn 
^ (1973) to use the 0 coefficient as a measure Qf '^th^ ear advantage has been an 
impprtan-t step forward. However, Kuhn's index is designed for two-response 
paradigms (or single-response paradigms with selective-attention instructions) 
and therefore does not solve the problem of unraveling correct responses and* 
errors. • 

f 

Halwe^ (1969) and Studcjert-Kennedy and Shankweiler (1970) have pointed o\it 
the l^w information content of the second of two responses. This observation 
Suggests that it may be more appropriate to ask for a single response only^^ In 
"fact, it seems that listeners often perceive only a-singl^Nsyllable when a ^ 
dichotic pair.^is presented. This tendency is more pronounced ^th syllables 
contrasting in only a single distinctive feature^ (voicing, for example. 



See, for example. Brain and Language , 1974, VoL.l, No. 4 and 1975, 'Vol. 2, 
No. 2. 

2 • ' ' ' • 

A comment on terminology is in order here. Many authors refer to "shared fea-^ 

tures" rather than "feature -contrasts," for example, /haf and /pa/ "share 

place" (Studdert-Kennedy and Shankweiler, 1970; Pisoni and McNabb, 1974). This 

terminology is awkward, for several reasons: (1) Any characterization in terms 

.of shared features is. indeterminate unless all shared features are enumeVatecl 

(which includes many irrelevant features) , whereas mentioning the contrasting 

features is informative eVen withopfe^ precise knowledge of' the complete stimulus 

set. (2) Features are dimensions and therefore are always shared, precisely 

•96 . ' ' ^ ' " 



** 3 

/ba+pa/; or place,* for example, /ba+da/) than with>^llables contrasting in 
both features (for. example, /ba + ta/): in a ''sama-dif fkrent'' judgment task,' 
the former receive more incorrect "same'* responses than the latter. Moreover, 
within the single- feature contrasts, place contrasts ar'^^Wich harder to discrim- ' 
inate from identical (binaural) syllables than voicing cidtrasts (Halves, 1969; ^ 
Blumstein apd Cooper, 1972; Repp, 1976). In other words, precisely aligned 
simultaneous dichotic syllables that dif-fer only in the direction of their in- 
itial formant transitions strongly tend to fuse and sound like a single syllable 
"originating itx .the middle of .t^ie Ji^^^if their intensities are equal). 

Cutting (>L972, 1976) has proposed a classif icatit>n of dichotic fusions that 
includes "psych'oacoustic fusions": when- /ba+ga/ is presented, /da/ is often 
heard. We will foll^ow Cutting and use the terta "psychoacoiistic fusicfn" only 
for this specifio phenomenon. However, it should be clear that fusion in the 
more general sense — hearing only a single stimulus wh^n two are presented — 
occurs independently of, the nature of the phonetic percept.^ Thus, /ba+ga/ 
sounds just as fused when /ba/ or /ga/ is heard ad wh^en /da/ is Heard, and^ 
7ba + da/ fuses just as well, although it will never give rise to a "new^* response. 

These considei;atioi\s suggest that it is useless to require a listener to 
give two response's when a dichotic place contrast is presented. A single re- 
sponse will contain virtually all ,the. information available to* the listener. 
(Upwever, it may be usefully supplemented by a measure o£ .response uncertainty, ^ 
such as confidence ratings, reaction times,' ot response distributions.) The 

, principal question^ is then: Hbw is the information from the two ears^combined ^ 
into a single percept? Cutting (2»72, 1976) has suggested , that 'psychoacoustic 
fusion is a relatively low-level auditory averaging phenomenon. Any such ex- 
planation should apply to all dichotic place contrasts. The present experiments 
attempt to investigate this question further by examining the identification of 

. dichotic fusions in some detail. " . ' . 

♦ , • 

From a methodological standpoint, it is important to determine whether 
dichotic fusions lead to the right-ear advantage (REA) commonly found in diphotic- 
listening. Several studies have indicated that place contrasts show a somewhat 



speaking. It is their values that may differ, and this seems to be somewhat 
better captured in the term "feature contrast" "(^that ik, a contrast with re- 
spect to a feature) tfian in "shared feature." (3) Most importantiy, feature 
sharing has often been interpreted as a factor fabilitatlng dichptic percep- 
tion. However, there is no known factor in dichotic listening that facilitates 
perception relative to monautal of "binaural presentation; rather, performance 
Is .impaired by competition as a consequence of featyire contrasts. Therefore, ^ 
the latter term will be used here exclusively. * 

^The notation i+j will be used to indicate a ditihotic'stimulus pair regardless ^ 
of channel/ear assignment of Uhe component stimuli, while i-j and j-i-^will 
designate the two specific channel assignments (i and j* stand for stimulus 
numbers; see Table 1). - . 

^Conversely, it may also be argued that, within the set of the Six st6p co.nso^, 
oants at least, there is characteristically cjnly one perceptual result, regard- 
less of whether phenomenological fusion occurs* ^ • 

. • ' ' ' 97. 

• ' • \' 1,01 ■ ' ' \ 



smaller REA than other feature contrasts* (Shankweiler and Studdett -Kennedy, 
1967a; 1967b; Studdert-^Kennedy, and Shankweiler, 1970). Since the place con- 
trasts in thetfe studies may not have been perfectly fused, the difference may 
in fact be larger. This is interesting with regard to the question at which 
level (s) in processing the REA arises. If it were the case that dichodic place 
contrasts fuse, at a very early stage ip processing and^then are transmitted in 
this form to each hemisphere, there should be no REA,^ince the REA is usually 



attributed to. transcallosal transmission loss oSxeft-ear information, assuming 
functional independence pf the dichbtic inputs prior to their convergence upon 
the dominant hemisphere (Studdert-Kennedy , 1975). On the other hand, fusion' 
may either occur at a^ higher .levej. (after central -convergence) or be an entire- 
ly autonomous phenomenon mediated by an independent low-level cross-correlation- 
aTL mecKanism, so that fused syllables are processed in basically the same way as 
less completely fused 'syllables; in this case, there should be no difference in 
REAs between the two. . . ' 

EXPERIMENT I 

The first experiment examined the identification of fused dichotic . stimuli 
from a "place continuum" (Pisoai, 1971) obtained by systematically varyijig the 
'starting frequencies of the initial fonnant trahsitions. The principal ques- 
tion were whether identification responses could be predicted by a simple audi- 
tory averaging model, whether a significant REA of "normal" magnitude exists, 
and whether psychoacoustic fusions are as common as suggested by Cutting (1972). 
The effects of variations in the acoustic properties and relationships of the 

fused stimuli were of prime concern with respect to all three questions. 
« 

Method . ♦ ' • ^ 

Subjects . Thirteen paid volunteers participated, se\ren males and six fe- 
males, all right-handed, unaware of any hearing trouble, and relatively, inex- 
perienced listeners. "The data of two additional subjects .were eliminated be- 
cause they were too noi^y. • 

Stimuli . The stimuli. were seven syllables ranging percepjtually from /bseV 
to /dae/ to /g^j/. They were produced t>n the Raskins Laboratories parallel reso- 
nance synthesiser. All syllables were of 280-msec duration, had a constant;, 
fundamental frequency (114 IJz) , a voice onset time of- -15 msec (that is, pre- 
voicing), 45-msec linear transitions, and no bursts but au abrupt onset of 
energy following the^prevoicing. The syllables differed only in the onset fre- 
quencies of the.second-formant (F^) and third-f ormant (F3) transitions, which 
are shown In Table 1. ' 

Dichotic ^pairs were constructed using the pulse code modulation (PCM) sys- 
tem at Raskins Laboratories^ The stimulus alignment precision of this cojnputer- 
Ifzed procedure is ±j0.'125 msec. All possible combinations 6f' the seven stimuli 
were recorded. In order to obtain stable identification scores for the seven 
syllables in isolation (that is, binaurally), pairs of identical sj^llables were 
replicated six times, so that there were 84 stimuli altogether: 42 identical 



The recent paper of Cutting (1976) was not available at the time of the experA- 



5 

men't . 
98 



TABLE 1: Starting frequencies (in* Hz) of secondrformant (F2) 'and 
third-forraant (F3) transiti6ns of the seven stimuli. 



Stimulus' Number 


•F2 


F3 


1 — 


1312. 


2348 


2 , 


1450 


2694 


3 - 


1620 


3026 


4. 


1772 


3026 


5 


1920. 


2694 


, 6 


' 2078 


2348 




2234 


2018 


'Steady-State /ae/' 


1620 


2862 



r / ^ 

(binaural) pairs and 42 nonidentical (dichotic) pajrs. Five-^aif fere^nt random 
sequences of the 84 stimuli ,wer^ r;ecorded. The interstimulus interval was 3 
sec. • ' " , - ' , 

Procedure . The subjects^were tested individually. or in small groups in a 
single session lasting approximately 90 minutes. Playback was from^an Ampex 
AG-500 tape recorder through ayi amplifier to Grason-Stad;Ler TDH-^3 9 .earphones. 
Playback intensity was adjusted and monitored on a Hewlett-Packard voltmeter, 
and -special care was taken tp equalize the intensities of the two" channels at 
/about ^5 dB SPL (peak def lec^^iions) . 

Each, subject listened twice to the five blocks of 84 stimuli. The channels 
were reversed electronically after-the first five blocks. The instructions j^ere t 
write dovm^one response fbr each syllable heard: B, D, or G, whateveV_the syl- 
lable soundfed most like./ 



The subjects were /generally not informed until after the experiment that 
different inputs were presented tg the two ears in half of the stimuli. (There 
were some exceptions, /because some subjects had previously participated 'in re- 
lated experiuients vit/h. dichotic fusions.) Most subjects agreed when questioned 
that they heard only /single syllables and showed surprise when told about their 
ac.tu.al natu;re. Thi^', together with the experimenter's impression, was consid- 
ered- sufflciehV . evidence' for the adequate fusion of the stimuli. (Formal te&ts 
were conducted latetin Experiment III with different subjects.) 

Results and Discug4ion ' , * . ^ . 

' ^7-: ' • 

The response pattern . The pooled results of the 13 subjects are shown in 
Figure .1. The ntimbers in the graphs represent identical (binaural) pairs, and 
the dashed lines connecting them ttace the categorical identification functions' 
for the seven stimuli. It can be ^een that stimuli 1 and 2 were generally Iden- 
tified as,B; 3 ;i^nd 4, as D; and 6 and 7, as G. Stimulus 5 was t^He- only truly 
ambiguous syllable, with somewhat more D than G responses. (The stimulus numbers., 
refer ttf Table' 1.) Some subjects produced noisy data, which ±b reflected in the 
averag^es; fQr /example, G responses to stimuli. 6 and 7 reached only 85-86 percent. 

- ■ , ■ . .- .99 

1J3 ' " ^ 




UJ 
3 

c/) 



S3SN0dS3a lN30d3d 



♦T3 

• M (U 
/-s CO <U »fi 

c to :3 n3 

^ (1) 

(0 o 



T 0 3 to *J 
AJ F» w a 

^ *o 4J a 

M 0) C 0) ^ 

rH (U to 
M tH <C0 C2' 

CXU-i M (U rH 

(U AJ ^ 

CO AJ C 
(U O CO O CO 

CO ^ :3 th :3 

C O rH CO 

O T\ 

CO 

u c 
o 



i 



CO 

^ M 

rH (U 

c a 

CO 3 

' CO CO 

I 

AJ CO 



CO 



U3 



CO 

I 

(U 

H . 

a 

•H 

CO 



CO 
C 
O 

a US. <u 
AJ us 

CO 'H AJ 

o ^ a 

C CO 

(A O M 

C2 AJ 
O AJ 

a 



U3 *J C 

CO :3 

M O C2 ^ 
(U AJ tT 

^ _ 

S us 
:3 CO o 

^•5 " 

CO C (U 
(U CO us 
CO AJ 



CO 



us t4\ 

AJ CO ^ 



O rH 



o 



CO 



CO 



CO 



a o -H 

C CS (U 

o (U us 

a AJ 

C us AJ 

o AJ a 

•H 



_ AJ 

to CO a o 

(U (S 
Q 

us 

M ♦ra a 

CS CO 

CO M 



a 



o 

o 

0) 
(S 

AJ -H 

CO'H 



M 



100 



ERIC 



■J 



Figure 1 
104 



Consider now the* other fijrmbols in Figure 1 that represent the.dichotic com- 
binations of different stimuli. Bach function connects the p^airs formed by one 
particular stimulus (denoted by the number at one- end of the function) and the 
stimuli along the abscissa. The pattern may be described as follows: 

(1) When a particular stimulus was paired. with other stimuli, the percent- 
age of responses in the relevant category tended to decrease as the Competing 
stimuli were further and further removed on the continuum. "Fhis was especially * 
clear for D responses, while the functions for B and G responses became fl^ and 
even nonmonotonic when /bae/ and /gae/ stimuli were paired wlj:h stimuli more than 
tv/o or three steps removed on the cont^^nuum. Note that B responses were at ^ 
minimum in p^irs with stimulus 4, while G responses tended to be at a minimum in 
pairs wiizh stimulus 3. ' /* 

(2) The percentages of responses in the three categories generally remained 
fn prdportlon to the binaural identification results for the component syllables 
of a dichotic pair; for example, the B-functlon for stimulus 2 (upper left-hand 
panel in Figure 1) lies uniformly lower than th'at far stimulus 1, and th^t for 
stimulus 3 is even lower. More iiateresting, however,^ is the fact that a similar 
difference exists between the G-functions for stimuli 6 and 7 (upper right-hand 
panel in Figure 1), although these two stimuli showed identical binaural identi- 
fication scores. Ijp addition, there is one crossover of functions: th^ D- func- 
tion for stimulus 3 lies abov^ that for stijnulus 4 in pairs with stimuli 5,. 6, * 
and 7 (lower left-band panel in Figure 1) . . ' 

(3) There was a tendency for /bae/ stimuli (especially 1) to ^ominate /gae/ 
stimuli (6 and 7) . A change in acoustic structure at the /bae/ end of the con- 
tinuum had a greater effect than an equivalent change at the /g^/ end, as indi- 
cated by the wider spdcing of the B functions (cf. upper panels in Figure 1). 

(4) Psychoacoustic fusions were /clearly present but rather infrequent, 
especially in pairs containing stimulus 1. The nume^rical results for the four 
relevarft stimulus pairs are shovm in Table 2. 

^ * ^' 

Lc /bae + g^/ pairs - 



* 


(13 subjects) . 










Stimulus pair 




Responses 






* 


B" 


D . 


G 




2 + 6 ' 


38.5 


25.4 


36.1 




2+-7 


45.0 


21.5 


33.5 




■ " 14-6 


• 67.3 


12.7 


20.0 




1+7 


60.8 


9.6 


29.6 



. Psychoacoustic fusions.: Three avera'ging hypotheses . The pattern of re- 
sults just descrit>ed .(particularly under paifagraphs 2 and 4) defin^tfely rules 
out a "phonetic averaging" (attentdon-switching or rivalry) hypothesis: If, for 
example, th^ 'two* stimuli competecf for a single phonetic processor, so that one 
syllable grfined access to the Jprocessor in. a certain percentage of . the trials 
while the other syllable was lQ$t,. the distribution of identification responses 



101 



ERIC 



103 



for a dichotic pair, would be a weighted average of the response distributions, 
for the two component stimuli in isolation. The same would.be true if both syl- 
lables were categorized independently in separate processors, and ah attentional 
mechanism with limited capacity selected one or the other outcome on a probabil- 
^istic basis. Instead, the existence of psychoacoustic fusions and of effects of 
acoustic wi^hin-category differences is evidence that the dichotic information / 
interaats- ^pxlo r t o th e-completion of jthonetic processing. - - 

A second hypothesis may be tea^med "articulatory averaging" (Cutting, .1976). 
It is similar to the phonetic averaging , hypothesis, except that it 'allows for 
psychoacoustic fusions by perceptual-articulatory interpolation at the feature 
level. However, tt, excludes any interaction between the acoustic .properties of 
the stimuli and therefore is clearly disconfirmed both by the present data and 
by Cutting's own. ** ' ' ^ 

On the other hand, th§| data are superficially \n accord with an "auditory 
averaging" hypotjiesis^ which assumes that the/formant transitions of the two 
competing stimuli (or rather; their equivalent auditory code/^ in the brain) 
fuse to yield new , intermediate' transitions, and the resulting new information 
is then phonetically interpreted.' This hypothesis has also be^n considered by 
Cutting (1976), who independently investigated the effect of acoustic stim- 
ulus ^variations on the frequency of psychqacoustic fusions. However, one pre- r 
diction would then be that /bae + gae/ stimulus pa:^,rs, such as 1 + 7 and 2+6, whicbl 
have about the same "average," should yield the same percentage of D responses. ] 
Instead, the acoustically more similar pair, 2+6, led to more psychoacoustic, 
fusions than the acoustically more dissimilar pair, 1 + 7 (of. Tables 2 and 3, 
upper left-hand quadrant), which parallels the results of Cutting (1976). There- 
fore, Cutting's conclusion that; simple. averaging of formant transitions is an 
insufficient explanation also applies to the present data.^ 

Another problem with the auditory averaging model is its deterministic 
nature. There is no /bae + gae/ stimulus pair for which only D responses are ob7 
tained. In fact, the frequency of psychoacoustic fusions iiT'the present experi- 
ment was surprisingly low. Nine of the thirteen subjects shc^d negligible 
frequencies (less than 7' percent,, after a correction for e^cpected confusions). 
One reason for this may have been the presence of F3 transitions , which were 
rising for /bae/ and /gse/ stimuli but falling in /da/ stimuli. In /bas + g^/ pairs, 
the "average" F2 transition may have been in conflict with the "average" "F3 
transition, so that the responses tended to shift among all three alternatives. 
The classical studies of Harris, Hoffman, Liberman, D^lattre, and Cooper (1958) 
and Hoffman (1958) have -shown (incidentally, also in the context /-ae/) that F3 
trahsitjion^ have '4 strong influence on the tendency to give D responses, with 
F2 transitions held ^constant: rising transitions decrease and falling transi- 
tions i,ncrease D responses." Cutting (1976) used two-formant syllables and ob- 
tained Higher percentages of psychoacoustic fusions than the present study; 



Of cour^^, the assumption of a linear (unweighted) auditory\averaging process 
is naive and probably' wrong. However, the conclusion that acoustic similarity 
plays a rble seems nevertheless justified. The present results differ from 
those of Cutting (1976) with respect. to the relative weight of low-frequency 
and high-frequency transitions. Here, low- frequency changes had a greater 
effect, while Cutting's data (for /ba + ga/) sho^ precisely the opposite. 

102 / ' • ' 



/ 

however, he also encouraged D responses by presenting only /ba+ga/ pairs to . 
uninformed subjects who were, given three response alternatives. 

• In order to check further on the role of F3 transitions, a new stimulus 
tape was prepared that .contained all dichotic and binaural pairs of seven syl- 
♦lables identical with those of Experiment I, except that they had no third^ for- 
manr. , BHR, who had also participated in five sessions, of Experiment I, 
listened to 30 random blocks of 49 ^imulus pairs each, in three sessions.- The 
results closely resembled his results with three-formant^yllables , except .for 
two of the four /bae + gae/ combinations. ' These results are shown in 'Table 3 
(upp'er portion). The pooled response distribution for the. four two-formant 
/bae + gae/ pairs differed significantly >f*tom that for the corresponding three-for- 
mant pairs (x^(2) ^ 7. 6, £ < .05) but,* clearly, the difference was due only to 
2 + 6 and 2+7, wlrlch -showed greatly increased frequencies of.D I'esponses. (Note 
that BHR generally gave an unusually high percentage of psychoacoustic fusion 
responses.) « ' , 

; ■—. ' . 

TABLE 3: Response percentages for four dichotic /bae + gae/ pairs: comparison of 
, three-formant and two-formant syllables in dichotic and mixed presen- 
tation (data for a sipgle practiced subject,^ BHR, based on 3-5 ses- 
sions per condition). 

^ y 

Stimulus pair * Responses 





Three^formants 


Two 


formants 


Dichotic t 


■ B 


D 


G 


B 


D ' 


G ■ 


2+ 6 


28.0 


47.0 


25.0' 


26.7 ' 


70.0 


3.3 


2+7 


36.0 


38\0 


26.0 


23.3 


53.4 


- 23.3 


1+6 


71.0 


26.0 


3.0 


73.3 


23.4 


.3.3 




63.0 


30.0 


7.0 


71.7 

1 


25.0 

f 


3.3 


Mixed 














2+6 


^5.0 , 


25.0 


20.0 


36 . 7 


58.3 


5.0* 


2+7 . 


96.3 


2.5 


1.2 


86.7. 


11.7 


1.7 


1+ 6 


46.3 


38.8 


15.0 . 


30.0 


.56.7 


13.3 


1+ 7 


77.5 


10.0 


12.5 


80.0 


6.7 


13.3 



ERIC 



It may be concluded that, in two pairs at least, the conflict between F2 
ancf transitions probably played a role. "However, even in the absence of a 
third, formant, psychoacoustic lions were far from the 100 percent predicted 
by a simple auditory ^averaging hypothesis. If this hypothesis is to be main- 
tained, considerable random variability in the weighting function of the averag- 
ing process must be assumed. Thrft^-^sumpti'on will be tested in Experiment II. 

*\ Ear dominance' and stimulus dominance . In order tq correct for perceptual 
confusions between, the stimulus categories (especially those provided by an 
ambiguous* stimulus) , left-ear and right-ear scores were derived for each stimu- 
lus pair. TWs was done by weighting each response by the relative frequencies 
of this particular response category for the two component stimuli in isolation.^ 
and by subsequent summation of these weights for each ear. Expressed formally, 

103 



9^ - "107 



X 



the right-ear score for a given dichotic pair i-j (with i in the right ear and j 
in the left ear) .was computed as 

^ ^ f(Rj^|i)+f(Rj^lj) y , * 

where f(Rj^|i-j) is the' frequency of response category Rj^ for the dichotic pair, 
f(Rj^|x) and f,(Ri^|j,) ^re the frequencies of response Rj^ to^^L and j, .respectively, 
when presented in .isolation, and the summation is ov^ the three response cate- 
gories. For the left ear, T^ECj) = N - TR£(i), where N is the total number of 
responses to this stimulus pair. The weight (the fraction) in Eq.- (1) was set 
equal to 0.5 whenever the combined responses to i and j in a particular category 
constituted less than 10 percent. The resulting scores are free from, overt 
variations in performance level, §ince the scores for the two ears alWys sum up 
to N, that is, there are no errors by definition. Because of the weigiil3.ng pro-- 
cedure, individual variations in accuracy (which do exist) play only a negligible 
role as long a's the "noise" does nota exceed a certain le^el. 

I The two scores for a given dichotic pair, Tj^g^) and T^gQ), have^ counter- 

' ' parts in the two scores for the other channel assignment of the same 'stimulus 
combination, Tj^gQ) and TleM)- 'These four scores were arranged in two differ- 
ent two-way contingency tables, and two 0 coefficients were calculated: the 
stimulus dominance index * 



-'^\E(i) -'^RE(j))/-(WLE)'^' -^"^h t'rE ="^RE(i) ''''m^) 



(2) 



0 



ERIC 



which indicates the ^degree to which stimulus i "dominates", stimulus j; and the 
ear dominahce (oj ear advantage) index • , . * * 

= ^^RE(i). -jLE(i))/^^(i)^(j))^^^ ^f^^ ^(i) = V(i)'*'^LE(i) ' 

(3) 

. . ■ .^"^ ^j) = '^R&(j)-'TLE(j)' • 

which describes the relative dominance of the right ear over the left ear. 
Overall indices were obtained by calculating 0, coefficients from sumirted response 
frequencies, with separate summations for i-j and j-i pair's (arbitrarily assum- 
ing that i<j on the stimulus continuum). ^ The significance of these indices was 
tested by x^(i) = (g*f; Kuhn, 1973). 

-^The denominator in the formula for the^0 coefficient is the geometric m^an of 
the two unequal marginal sums in the contingency table (-the other two marginals 
being equal to N/2). Unless the difference between th^e marginals is very 
large, their -geometric mean is similar to their arithmetic mean, wliich equals 
N/2. 0j) [Eq. (2)] is therefore usually well approximated by 2(Tr£qx -"TR£(j\)7N, 
and 0E [Eq. (3)] is usually almost identical to 2 (Tj^/j^) - T^r^) ) /N, except in 
cases of extreme stimulus dominance. If the entries in the contingency table 
are expressed as percentages (that is, ^ivided by N/2), 0^ and* 0^^ can ^e esti- 
mated at glance. 'This relationship also**' justifies the calculation of ai^ overall 
index from summed response frequencies, which usually deviates only very slight- 
ly from the average of the coefficients for individual stimulus pairs. 

104 ' ,* - 



r 



^The crucial question was whether* the REA obtained from Eq. (3) would be 
comparable to the .REA found in a two-response paradigm with a Larger stimulus 
ensemble. The* results are shown in the left third of Table 4. The 13 subjects 
exhibited a significant average REA, with six* significant individual REAs but 
only one significant lefjt-ear advantage. These results were compared With 
thoe^e of a recent study that use4 the complete set of six stop consonatits and 
reported the distribution of Kuhn's (1973) jzJ coefficient for 22 subjectss- , 
(Shankweiler and St udder t-Kennedy, 1975).* The two distributions were virtually 
identical (Mann-Whitney test: « 0.03). To the degree that the two ear-advan- 
tage indices are indeed equivalent, and within /the limits imposed by the small 
sample sizes, this comparison ihdicates that dichotic fusions show just 'the same 
degree of in average REA ashless completely fused syllables (which make up the 
majority of the combinations of alL six stop consonants), so that phenomenolog- • 
ical fusion is probably unrelated to the degree of REA obtained. The siualler 
4IEA9 reported for place contrasts in the past were most likely artifacts of the 
two-resijonae requirement and of the ear-advantage indices used. 



TA^LE 4: . Uichotic ear dominance indices, and dichotic and^mixed stimulus doml- 
< nance indice? 'for individual subjects. 

! , \ a 



Subject No. 

1 
2 

. " 3 
4 
5 
6 
7 
8 
9 
10 

i 

■ 11 
12 
13 

Total ' 
BHR (3-F)^ 
BHR (2-F)^ 



Dichotic ear doffl. 



0E 

0.13 
tf. 0], 

-0.04 
0.06 
0.10 

-0.06 
0.14 
0.10 

-0.03 
0.18 
0.'22 
0.06 

■rO.12 



X^l) 

- 7.1 
0.0 
0.7 
1.6 
4.5 
1.3 
. 7.8 
4.1 
0.3 
13.2 
20.9 
1.5 
6.5 



£ 

<.01 

n.s.^ 

n.s. 

n.s. 

<.05 

n.s . 

<.01 

<.05 

n.s. 

<.001 

<.poi 

n.s. 

. <.02 



Dichptic stim*. dom. 

V<.05 
f.OOl 
.001 
nVs. 

<)ooi " 

.001 
<.01 
<.01 
<.'01 
<.001 
n.s. 

<K.001 

n.s. 



0.06 17.9 <.001 
0.11 25.2 <.001 
0.03 1.0 n.s. 







d.3.0 


/ i.4 




29.3 


<> 

-0.37 


57.7 


0.02 


0.2 


0.19 


15.6 


0.19 


15.2 


0.14 


8.4 


0.13 


7.5 


-0.15 


10.0 


oa8 


14:3 




0.2 


oi^ 


-60.2 


0.03 


0.5 


0.09 


41.7 


0.05 


4.9 



.<.(J5 
O.IL 15.1* <.001 



Mixed stim. dom. 



X^l) 



0.a4 \JO.7 n.s. 
^ 0..07 1.8 cfi.TB. 

-0.48 98.1 ^.001 

(excluded)- 

(excluded) 

-0.01 . 0.0 n.s. 
, 0.22 26.8 <.001 
0.30 37.0 ^.061 
0.3S 51.4. <.001 

(no data) 

(no dat^) 

(no data) 

(no ' d^ta) 

0.07' 14.0 <.001 
f (not calculated) 
(not Calculated) 



^Three-'formant syllables, calculated from the totals over five sessions 
^Two-formant syllables, calculated from the totals over three ,#^essions. 

« 

Table 4 also shows a highly significant REA for BHR. Interestingly, how-' 
ever, his REA with two-formant stimuli was much smaller and did not reach sig- 
nificance. This finding, which suggested that auditory stiipulus complexity may 
influence the REA, was followed up in Experiment IV. * • 

i05 



•erJc 



4 J 9. 



Actually, £he ear advantages were slightly underestimated because one-step 
contrasts, which were mostly within categories (e»g., 1 + 2 and 3 + 4), were in- 
cluded,. Of tl^e 21 individual stimulus pairs, 20 showed a positive average 0e. 
There w§s a tendency toward 'larger RE^s with increasing separation of the com- 
ponent stimuli on the continuum: the aVSrage 0£ increased from +0.04 (two-step 
•pairs) to +0.08 (three-step pairs)^*to +0.11 (four-, five-- and six-step -pairs) , 
despi*t# the 'occurreude of uninformative psychoacoustic fusions at the largests, 
separations. Hence, acoustic stimulus disparity may play a role in determining 
the magnitude of the REA, a .question of considerable theoretical importance that 
deser'^es furth€? study. ^ ' ^ 

Table 4^ (center) also shows the average stimulus dominance ' (0^) indices for 
the individiial subjects. ^ These indices express the average dominance of i oyer 
j, summed over all'i<j; or, in other wo.rd^, the degree of perceptual dominance 
of lower*- frequency -transitions over higher- frequency transitions (assuming 
thai competition between F3 transitions plays only a minor role). This average 
index is rather crude, but it captures soipe striking individual differences. , 
The overall 0d was positive and highly ^significant, indicating strong dominance 
of lower- frequency transitions. However," 2 pf the 13 subjects had highly ^ig- 
^ nificant negative "coef flcients. ' 

/ • \ " . 

* The 0J5 indices for the Individual stimulus pairs, which were ^of primary 
interest, were by no meajis homogeneous, as wa^' already evident from Figure 1. 
Only a few'pairs we»re in perceptual equilibtium (0d«O), and stimulus dominance 
-effect^ were considerably stronger than ear dominance effects. The stimulus 
dominance pattern for a subgroup of 7 of the 13 subjects is illustrated in 
Figure 2 (filled triangles). This subgroup was selected for reasons of cpmpari- 
son with the results of Experiment II4 their flata are representative of all 13 
sub^^cts, except tha^t the average 0j)* was somewhat- reduced. IH^ussion of the 
dominancna pattern will be reserved for tfie General Discussion section following 
the description of Experiment II. . ^ " 

* ' - * • ' 

^ ' EXPERIMENT II ^ ' • , 

, The relatively low petcentages of -psychoacoustic fusions in Experiment I. 
may have been^due to random variability in ear, dominance or stimulus dominance 
from trial to trial. Psychoacoustic fusions may occur only* When the two sylla- 
bles receive vtfty nearly equal weights in the hypothetical auditory 'averaging . ^ 
.prpcess; a slight tip of the balance in favor of one stimulus may lead per- * 
.oeptual dominance of that stimulus • . However, ,irfidn the txTo syllables in a pal;: 
*are acoustically combined before they reach the ear, the potential factor of 

• v.ariability in ^eaf dominance is excluded. In addition, auditory averaging may 
occur af a more peripheral stage and may reduce ?ny variability arising at more 

* centoral, levels. Therefore,* this hypothesis predicted an increase in psycho- - 
acoustic fusions fo^ mixed stimuli. . / 

\ 'A compai^ison of dichotic and "mixed pairs btomised to be-intejegting with 
resperct to the whole "dominance pattern" "of^^irtaividual stiuwiius combinations. > 
The peripheral ^interactions coming into play in the mixed *ode (acoustic inter- 
ference, auditory onasking) may well lead to an entirely dj/fferent response 
patt^ern than in the* dichot^ts: mod«. On the other hand, anA significant similar- 
ities 'between the, two dituatrons will have to be ascribed \o common central pro- 
"cessing levels. ) ' . * ' \ 

106 ' - • 



Method ^ , • - ^ 

Subjectg . Nine of the thirteen subjects in Experiment I participated, one 
of them .prior to Experiment !• The data of one additional subject; were elimin- 
ated because they were too noisy. 
\* 

Materials . The same stimulus tape as in Experiment I was used. 

Procedure . The procedure was^ identical with that of Experiment I except 
that fhe output of the two tape recorder channels was mixed electronically and 
presented binaurally. The intensity was readjusted to about 85 dB SPL. Special 
care was exercised in equating the intensities of the two channels before they 
entered the mixer. There was no reversal of channels here. 

^ ^' . ^ 

Results and Discussion 
» • ' 

Controls . A c9mparison of the response distributions for pairs of identi- 
cal syllables in the dichotic and mixed conditions revealed significant differ- 
ences for six of *the seveu syllables. However, thfe changes consisted primarily 
in a reduction of the "noise" and an increase in response consistency, so that ^ 
familiarity and practice were the most likely cause. In view of these changes 
in the "baseline" scores, it was especially important to compare the response 
patterns in the two conditions by means of a measure that takes these changes 
into account. This was achieved by weighting the data as in Experiment I [cfv 
Eq. (1)1, witfi "channels" replacing "ears." Subsequently, 0^ and 0^ ("channel 
dominance") coefficients were calculated [cf. Eqs. (2) and (3)]. , ^ j 

While, at the levels used here, intensity difference^ of a few decibels ^ 
have Jit^^are 4f feet in dichotic listening (Speaks ajid Bissonette, 1975), the 
mixing ptocedure was likely to be sensitive to small channel imbalances. The 
0Q coefficients served as a check on the proper equalization of the two channels 
prior to mixing. Two subjects indeed showed higlily significant 0^^ coefficients, 
bo,th^ in the same single session. This indicated a calibration error, and the ^ 
data were excluded from further consideration. 

Psychoacoustic fusions . Table 5 compares the responses to thd four 
Ih^^-^l pairs in the dichotic and mixed conditions for the same seven subjects. 
Surprisingly, D responses were clearly less frequent in the mixed condition than 
in the dichotic condition, with responses making up for most of the differ- 
ence. This was probably not a practice effect, since BHR — who again participated 
in five sessions — showed precisely ^the same decline in psychoacoustic fusions 
(Table 3, left portion), .and a correction for 'expected D< confusions did not 
eliminate the difference. It nay be noted that the dataW Halwes -(1969) 
showed a simi).^r reduction in psychoacoustic fusions *f or ^iLxed syllables. 

Since it was conceivable that again the presence of a third formdnt somehow 
played a role, BHR once more served as a control subject and listened to mixed 
two--formant syllables' (30 blocks in 3 sessions). The results showed an increase' 
in psychoacoustic fusions with respect to mixed three-form^nt syllables but a 
reduction with respect to dichotic two-formant syllables (Table 3). This shows 
that the reduction was not due to a change in the salience of the third formant. 

The stimulus dominance pattem ^^ The overall 0D coefficient was again sig- 
nificant And in favor of the stimuli wifh the lower numbers on the continuum 

' • 107 



ER?c ■ ' • la' 




Response percentages for four dichotic /bae + gae/ pairs: within-subject 
comparison of dichotic and mixed conditions (seven subjects)'. 



Stimulus pair 

2+6 * 
2 + 7 

1+ 6 ' 

1+7 * 



. Respon)pes 



32.1 
38.6 
55.7 
50.7 



Dichotic 






* Mixed 




P 


G 


B 


D 


G 


32.1 


35.8 


42.1 


19.3 


38.6 


32.1 


29.3 


46.4 


17.9 


35.7 


16.4 


27.9 


76.4 


2.9 


20.7 


15.0 


34.3 


59.3 


5.7 


■ 35.0 



(lower- frequency F2 dominance) but slightly reduced in comparison ^to the dichotic 
condition (Table 4). Again, ^ there were large individual differences, also from 
one condition ts> the other (cf. Table 4) . , ♦ 

The stimulus dominance indices for the individual stimulus pairs in the two 
conditions are compared in Figure 2. The 0j) values in Fig;ure 2 represent the 
dominance of the stimulus held constant in each panel over the stimuli on the 
abscissa. (Each individual stimulus combination, i + j, may be found twice in 
Figure 2, once in the panel for i with j on the abscissa jr- and once in the panel 
for j with i on the abscissa, with* a 0d .coefficient of (Opposite sign^ Of course, 
0j) = 0 for identical pairs.) It is evident that, with few exceptions, the func- 
tions for the mixed condition exhibit the same basic pe6ks and valleys as those 
for the dichotic condition. There are. some consistent^ diffeyences as 'well, 
primarily in pairs containing stimuli 1 and 2: in th6 mixed condition, these 
/bae/ stimuli showed increased dominance over /^/ sttimili (5, 6, 7) but reduced *. 
dominance over /dae/ stimulu (3, 4). The dominance relationship between /dae/ and 
/gpe/ stimuli did not change very much. * 

BHR's data, were in excellent agreement with those of the seven subjects. 
The dominance pattern of BHR's two-forraant ^results was virtually identical to 
that of his three-formant results, in both the dichotic and mixed conditions, 
suggesting a negligible role of the third formant .apart from its feff^t hn the 
frequency of psychoacoustic fusions (which were neutral with regard to dominance 
relationships). Consequently, the differences between the dichotic and mixed 
conditions were jthe same' for two-formant and three-formant syllables. 

. GENERAL DISCUSSION: -PICHOTlC INTEGRATION 

It was noted earlier that a simple "auditory Averaging" model — which 
assumes thap^ single auditory stimulus, somehow intermediate between the com- 
ponent stimuli, is interpreted phonetically-ris somewhat inadequate in explain- ^ 
ing the data. It predicts more psychoacoustic fusions than were/actually ob- 

'Especially in the mixed condition where auditory averaging should have , 
been, perfect, and it Qannot account for the effect of stimulus dissimilarity on 
psychoacoustic fusions (found also by Cutting, 1976). The model may be modified 
to include random variation in the weights of the averaging process, although 
the source of the variation is obscure in the mixed condition. Alternatively, 
one could assume"* that, in analogy to vision, fusion (auditory averaging) alter- 
nates with rivalry (dominance), the probability of rivalry increasing with 

.108 • ^ _ ^ 



ERLC 



12 



u 
o 



\ 




0) 

u 
o 

00 
fx* 



Figure 2 



109 



4x3 



stimulus dissimilarity (Cutting, 1976). While this would account for the pat- 
tern of psychoacoustic 'fusions, the usefulness of a special model 'fof this 
. specific ph^ngmenon is limited. Clearly, psychoacoustic fusions should be ex- 
plai-nable Ji^fthe,same principles of interaction as other responses. In other 
words, an appropriate 'model should explain the total dominance pattern. 

*The simple auditory averaging model and Cutting's fusion-rivalry model 
^ 'allow for variable dominance relationships be^t^een pairs of stimuli, but only 
in a form'that is related tq auditory parameters. Fojt^ example, consider the 
dominance fu^nction for stimulus 1 in Figure 21 since the starting frequency of 
the F2 transition incfreases monotonically witri stimulus number, the dominance 
function for 1 was expected to be a monotonia function (rising if lower frequeia- 
bies tend to dominate higher frequencies, ana falling if the opposite is true).. 
Because of the possibly special status of straight formants, a sitiooth curvilinear 
function would also be reasonable. J (BHR's dkta suggested that the third dormant 
played only a negligible role.) 'However, there 1^ no straightforward auditory 
explanation for the abrupt and "strikiiig dip /of the function at stimulus 4 (that 
is, for the pair 1^4) and the equally abrupt reversal at stimulus 5 (1+5).^ 
Similar observations may be made in several/ other panels of Figure 2 (for ej^- 
ample, panels 3 and 6). The data from the /mixed condition weigh especially 
heavy here. Apparently, then, evBn when two stimuli are acoustically superim- 
posed and/or perceived as a single .syllabi^, th^ perceptual mechanism does not 
treat the composite information simply as the auditory average of its two, con- 
stituents. ' 1 

Therefore, we must turn to a'differen; model. The model, to be sijggested 
assumes that the acoustic cues of the ^comppnent stimuli reggaain independent and 
largely intact beyond the audiitory propessing stage^ even in mixed syllables, 
where a stimulus with a rising transition pius. one with a falling •tr^ansj.t ion re- 
sults in a fused stimulus with both a msi/ng and a falling. transition. We 

. assume that to this composite inf ormatipiy a pattern recognition process is 
applied .that consists^ in comparing it wLch "ideal'* rep^resentations ("prototypes" 
or "schematsa"-- 'cf . Posner, 1969; Rosch, 1975) of the relevant speech sounds in 
'long-term memory. From these ideal rjepresentations or prototypes, the one is 

^ selected that matches the input mdst closely. ***** 



This process of speech recognition Qan be conceived as active or as passive 
(Morton and Broadbent, 1967). The active form is usually referred to as analy- 
sis'-by-synthesis, pattern matching, or hypothesis testing. The passive form, 
which is preferred here on heuristic grounds, may <be formjil.atad ih terms of 
Morton's "logogen model" (Morton, 1969) or in terms of banks of selectively 
tuned feature detectprs (e.g^. Cooper and Nager, 1975)., An equivalent but more 
abstract conception is in terms of a multidimensional perceptual spSte whose 
dimensions are the derived auditory characteri*stics of the relevapt ^et of 
speech sounds. The relevant response alternatives are located as fixed "ideal 
points" in this n-dimensional space, while an incoming stimulus* generates a 
point at some location corresponding to its auditory properties. Because syn- 
thetic stimuli are acoustically much simpler than real speech (which the proto- 
types represent) , they will be mapped into a subspace of lower dimensionality, 
for example, a F2'"F3-transition-frequency plane, in the present case. The dis^ 
stances from the stimulus poiht to all prototypes are assessed in^ parallel,* and 
a subsequent decision process selects the prototype with the shortest associ- 
ated .distance as response. A more concrete conceptualization of the calculation 
of distances is in terms of a "spread of excitation" from the stimulus points. 



110 



ERLC 



1 



whiic\ leads the prototypes to be activated or to ^resonate" in proportion to 
their ijlstance from the stimulus point. 

N. * • 

The model thus comprises three states: (1) Auditory processing, vhlch maps 
an acoustic^ timulus into perceptual space; (2) multicategorical processing, 
which generat^ a multicategorical vector of prototype act.ivation values; and 
X3) a (uni) categorical decision, which selects the response category by deter- 
mining the large^ element in the multicategorical vector. (Stages 2 and 3 con- 
stitute what has been traditionally called phonetic processing — Pisoni, 1975; 
Studdert-Kennedy , in\^ress.)^ 

Random variab^ilityVay arise at any of the three processing levels: in the 
representation of the stimulus ^ioints in per(;eptual space ("perceptual noise" — 
cf. Repp, 1975b), or in the baseline activatipn levels of the prototypes, or 
perhaps, in the final decision process itself. These details will^not concern us 
further here. The point to be made is that a stochastic pattern recognition 
model of this sort may provide a usefuj. framework for explaining the speech rec- 
ognition process, even when applied. in an informal ^that is, nonnumerical) 
fashion. 1 • 

* V 

This model should apply to dichotic fusions or mixed stimuli as well as- to 
any single input. Since the location of a stimulus point in the hypothetical 
perceptual space is determined by its derived auditory characteristics (its 
"acoustic cues"), and since. fused or mixed stimuli contain multiple cues (for 
examf)le, two d/Lfferent transitions of the same formant) , they will lead to two 
stimulus points^ in perceptual' space. The listener is usually not aware, of this 
fact but only of the perceptual outcome that will be determined by the proto- 
type that reaches the highest level of activation from the' simultaneous presence 
of. th€^ two stimulus points. 

This assumption predicts the most important feature of the data: the pat- 
ter^ of dominance relationships. The ftodel implies that, of two fused stimuli, 
tha^l^timulus will dominate that is cloaet to a prototype in perceptual space. 
In othfer words, stimuli close to a category boundary and far from the category 
prototypes will tend to be dominated by stimuli that are far from a category 
boundary and close to a prototype. This is what Figure 2 seems to show, on the 
whole. Stimulus 1, for example, dominates 5 precisely because the latter is 
ambiguous, whereas it does not dominate 4, which is a good /dae/; and, It domin- 
ates I only slightly, since 7 is a good /gae/. Stimulus 2, which is a less 



This "holistic" mo?tBl automatically takes into account certain interactions be- 
tween the processing of different features of a speech sound. An alternative 
model might postulate t^t "multicatfegorical" processing takes place at the 
auditory le>?el, by means of selectively tuned feature detectors (e.g.. Cooper 
and Nager, 1975) that apt as auditory prototypes. This auditory stage would 
then be followed by a. Series of feature decisions whose outcomes are finally ^ 
combined into a response. However, this model would have to explain why the 
feature detectors are selectively tuned as they are, and it would haVe to in- 
clude additional mechanisms . for the interaction of different feature decisions. 
It is worthwhile, therefore, to adopt the holistic model as a working hypothe- . 
sis, until there is sufficient reason to reject it. We cannot decide between^ 
the two models on the basis .of the present data because only a single feature 



is involved. 



Ill 



1x5 



4, 



ERIC 



perfect /bae/, tends to be dominated by^ost other stimuli, and so on. The pre- 
dictions of tlie model are not confirmed in every detail, but they nevertheless 
seem to provide the best explanation of &l;ie overall pattern. 

/ . However, there are other features that the model cannot explain as it . * 
stands. Note that stimulus 7 is dominated mos^ strongly .by* 3, while 1 is dom- 
inated most strongly by 4 (Figure 2). In addition^ psychoacoustic fusions and 
the differences between dichotic and mixed pairs peed to be accounted for. 

* . ^ 
Fsychoacoustic fusions are explained as follows: if a stimulus in isola- 
tion receives 100 perpent B responses, this does not mean tbfitt only the B proto- 
type has been activated-by this stimulus. Because of the hypothetical spread 

of excitation, all prototypes will be activated to some degree; but. if the 
^ 6timulus £s sufficiently close to the B prototype and the noise in the system 
not too high, the activation*^ levels of the other prototypes will never ex- 
ceed the level of the B prototype. However* in dichotic competition the activ- 
ation resulting from the two stimulus points will be integrated by the proto- 
types, and since the D prototype is likely to lie somewhere between the B and* G 
prototypes in perceptual space^ it will profit most from this integration. Tf 
both the /bae/ and the /gae/ stimulus in a pdir are close to the ,D boundary, their 
joint acitivation of the D prototype may even e^^ceed that of the B and G proto- 
types. So, for exaiftple, 2 + 7 should yield more D responses than 1 + 7; and 2 + 6, 

• iQore than 1 + 6, which was in fact obtained (cf. also Cutting, 1976). The* com- 
ponent stimuli, 6 and 7, 6n the other hand, had no dif ferential^ef feet of D re- 
sponses, which seems to imply that their activation of the D prototype was equal 
in degree. This, is not quite in accord with the model, but it is plausible that . 
differences at higher frequencies have a smaller effect than differences at low- 
er frequencies. 

The same reasonir^g explains why 1 wa^ dominated mbst strongly by 4, but 7 
was dominated most strongly by '3. Clearly, 3 is more likely to activate the B ' 
J prototype than 4, so that, in the pair 1 + 3, the B activations will summate and 
outweigh the D activation due primarily to 3 alone. In 1+4, 4 will contribute 
less to the activation offthe B prototype and will have a stronger stand 
against B. The opposite a^^rgumenfe applies when 3 and 4 are paired with 7. 
(These relationships are also predicted by the auditory averaging model.) 

The prototype model cannot account for the differences be.tween the dichotic 
and mixed conditions. Most likely, this difference c^n be traced back to pe- 
ripheral auditory masking, whJ^ch comes into play in the mixed condition, the 
da^a suggest that, in mixed syllables, rising .transitions (in stimuli 1 and 2) 
'tended to mask (dominate) falling transitions, and relatively flat formants 
(stimuli 3 and 4) tended to mask rising transitions. The first ejffect may re- 
flect the "upward spreaM^ of masking" familiar from the auditory masking litera- 
ture, while the second effect may reflect d higher susceptibility to masking of 
transitions' in general,v as compared to steady-state formants. The reduction iii 
psychoacoustic fusions in the mixed condition was most likely due to the making 
of /gae/ by /bae/, so that B responses increased at the expense of D,and G re- 
sponses. ^ ■ ' , * 

, The results pertaining to ear advantages will^be discussed after two.^ddi- 
tional experiments have been reported. 

« ■ ■ ' 

V 

• 112 

. IS 



EXPERIMENT III ' • ^ • 

This ^ brief 'experiment served to demonstrate what had been based only, on in- 
trospective evidence in Experiment I, viz. that dichotic fusions are difficult 
or impossible to discriminate f^rom binaural .syllables. In the Introduction,, I 
have referred to the results of several experiments that seemed to show ^that 
place contrasts frequently, but not always (in .about 60 percent of the cases), 
sound like a single syllable (Halwes, 1969; Blumstein and Cooper, 1972;. Repp, 
1976). However, these studies did not differentiate J)etween voiced and voice- 
less^ place contras'ts (the latter may be less completely fused than the former), 
and they employed .only ^ single, unambiguous token from each category, so that 
the' frequent ambiguity of dichotic fusions may have assumed the role of a dis- 
tinctive cue. ih test the proposition that binaural and dichotic pairtf cannot 
be distinguished, ambiguity must be made irrelevant. This is at least partially 
achieved by using syllables from a place continuum, so that' at least one of the 
identical paips will be ambiguous (stimulus 5, in the present case). The false- 
alarm rates ("different" responses) for this ambiguous pair should reveal 
whether the ambiguity cue plays any role. 

Method , ' • . , " 

Subjects . Eight subjects (four men and four women) participated who had 
not taken part in Experiments I and II. All subjects were right-handed and 
without hearing trouble, with the exception of one sul^ject who claimed to have 
a 5-dB hearing loss in the /right ear. ^ 

Materials. The stimulus tape of Experiment I was usedi - 

Procedure . This discrimination task was appended to Experiment IV, taking, 
up the last. 20 minutels of a session. Each subject listetved first to one block 
of 84 syllable pairs (half identical, half nohidentical) atid wrote down "1" when 
he thought a pair consisted of two identical syllables and "2," when it consisted 
of two different syllables. (To avoid confusion with the stimulus numbers, 
these responses will be ,ref erred to as "same" and "different," respectively.") , 
During the next block of 84 syllable pairs, th6 subject merely followed the cor- 
rect responses that had been filled in on the answer sheet. Af.ter this feedback 
trial, another block of judgments followed. The subjects were instructed that 
there was an equal number of identical and nonidentical pairs,, and that ambiguity 
was not an indication that two different syllables had been presented. 

Results and Discussion 

As predicted^ average performance was very J)Oor, although slightly aboVe 
chance (56 percent correcj.tXl performance of three Individual subjects^ was 

significantly above .chdnce (67, 62, and 58 percent correct, respectively).' BHR, 
who participated in. four sessions, performed at chance level (51 percent cor- 
rect), and sa did another highly -^experienced listener who listened informally. 
The feedback did nqf.imprgve, perfdtmance. i 

A more detailed analysis was conducted ±n otd^jr to find out whether^ambigu- 
ity .pl^iyed a role and whethier accuracy lacreased wi£h the acoustic dissimilarity 
. of .the syllables in a pair. The data at^. shown in Table 6. The mos^t ambiguous 
Identi^iVl pair,. 5 + 5, did not show an increased false-alarm rate, suggesting 
that: ambiguit:^ did jaot serve as a distinctive cue in th;ts t^ask. On th'e other 



hand, the "hit rate" for nonidentical pairs increased monotonically with the 
number of steps separating the two syllables in a pair. At the first glance, 
this seemed to suggest that within-pair acoustic dissimilarity played a role. 
However, a closer look at the data showed th^t, this was probably not tjrue, and 
that the result was due to the confounding of acoustic separation with the 
acoustic characteristics of the component syllables. . (Pairs with large separa- 
tions did not contain any stimuli from the middle of the continuum.) Table 6 
shows that both 'the hit rates for nonidentical pairs. and the false-alara rates 
for identical pairs were greatly increased when a pair contained stimulus 1, 
indjicating a strong bi^as to respond "different." Hit rat'es were also increased 
for most pairs containing stimuli 2 or 7, relative to the remaining pairs. 
Hawever, within these groups of pairs (holding one stimulus constant), no clear 
relation to acoustic dissimilarity could be discerned. 



TABLE 6: Percentages of "different" responses to nonidentical pair3 ("hits," 
off-diagonal) and identical pairs ("false alarms," diagonal). 





1 


2 


Stimulus 
3 4 


number 
5 


6 7 


u 


















1 


61 




f 










2 


" 69 


54 












3 ' 


75 


50 


40 








> CO 


4 


75 


31 


28 


26 






i. 


5 


69 


50 


22 


28 


SO 




•H 


6 


75 


50 


38 


. 34 


25 


21 ' . 


4J 
CO 


7 


81 


50 


47 


53 


47 


16 19 


The most likely 


explanation 


of 


this 


pattern of 


results is that thfe stimuli 



from the ends of the continuum had some peculiar acoustic properties, perhaps 
owing to the steep slope of their transitions. This , artifact, which may have 
been due to limitations of the synthesizer or may have been psychoacoustic in 
nature, was apparently interpreted incorrectly as a relevant cue. The only ex- 
ception to this interpretation is the very low rates of "different" responses 
to the pairs 6 + 7 and 7 + 7 (Table -6). " / ^ 

Apart from this issued the data do provide somfe evidence of better-than- 
chance performance of some subjects, which remains an astonishing and somewhat 
puzzling feat. For all practical purposes, however, it may be concluded that 
dichotic voiced place contrasts aire perceived as single syllables. 

* , EXPERIMENT IV " - - 

The fourth experiment served »three purposes. First^ it attempted to demon- 
str,ate the ine&fectiyeness of selective-attention instructions with dichotic 
fusions. Although" Halwes (1969: Experiment 5) fputid no effect of selective 
attention in "fusecT' syllable pairs, a sul^sequent experiment of his ^showed a 
slight "effect (Halwes, 1969: Experiment 6). His stimuli actually included all 
si,x stop consonants and were called "fused" only because they had the same fun- 
damental frequency. Repp (1973, 1976) has also demonstrated small selective- 
attention effects' for such stimuli. The question here is whether the components 
of perfectly fused voiced .place contrasts can be attended selectively. 



The secbnd purpose was a test of the hypothesis suggested by BHR' s 'smaller 
REA £or two-formant stimuli than for three-f ormant stimuli in Experiment I 
(TabjCe. 4). It may be that stimulus complexity (which in tum.^may be related to 
spe-eeh-likeness and naturalness) is positively correlated with the REA obtained. 
For this purpose, two-formant and three-dormant pairs were compared in the same 
design. The role of the third formant in stimulus dominance relationships was 
also of interest. ' ^ * * • . ' 

The third purpose of Experiment IV was simply ta create a more typical test 
situation, using only one token from each cat^^ry, in order to find out. how 
serious the problems of stimulus dominance, ^P^mulus heterogeneity, and individ- 
ual differences actually are in. this more "naOiral" setting. Any such problems 
encoiin-tered should reinforce the methodological suggestions^ to be made in the 
final Discussion. 

♦ ' 

Method t 

Subjects . The same subjects as in Experiment III participated. However, 
the data of one subject who did not hear any /gae/s at all were excluded and re- 
placed by data for BHR as a subject (f#bm the first of four sessions in which 
he participated). ^ 

Materials . The stimuli .were three syllables, with or without third for- 
mants, from the same place continuum as in the earlier expe^riments.^ The /da/ , 
was stimulus 4 of Table 1, the /bae/ had slightly, more extreme transitions than 
stimulus 1 of Table 1 (starting frequency of F2: 1232 Hz; ^3, if pres^ent: 
2180 Hz), and the /gse/ was intermediate between stimuli 6 and 7 (F2: 2156 Hz; 
F3.: 2180 Hz) . > ' ' ^ ' * 

The experimental tape contained a brief monaural practice list of 30 random 
syllables (five replications of each of the six stimuli). This was followed by 
two blocks of 180 dichotic pairs. Each block contained 10 subblocks, each rep- 
resenting a different randomization of 18 dichotic pairs made up from the nine 
possible combinations of the three syllables with two fo'rmants and with three 
formaots, respectively. (Two-formant and three-f ormant stimuli were never- 
paired with ea.ch other,) < ' 

Procedure . After trying to identify the practice syllables (and repeating 
the series, if ^necessary) , the subjects listened twice tp the experimental tape, 
that is, to four blocks of 180 dichotic pairs. For two of these blocks, the 
subjects were instructed to shift their attention to one side, by whatever means 
they found suitable. It was explained that the syllables actually consisted of 
two different inputs, and that only the syllables in the designated ear were to 
be identified- In the remaining two blocks, no selective attention was required, 
and the subjects simply wrote down what the fused syllables sounded like.' The 
sequence of attention/no-attention conditions and of left-ear and right-ear ' 
selective attention was counterbalanced across subjects. 

\ / ' • 

Resu.lts - V 

The data were analyzed as in Experiment I. There 'vas a significant overall 
•REA (0E = +0.07^ 2 < .01). Five of the eight subjects showed significant REAs, 

.115 



one^ subject a significant LEA.^ The hypothesis of a difference in REA for two- 
formant and three- fonnant syllables was not confirmed. Although individual sub- 
jects showed considerable differences, the average 0g indices were identical. 
BHR even showed a slightly larger REA with two-formant syllables, contrary to 
the opposite difference in Experiment I, which had given rise to the hypothesis 
in the first place. 

The effect of selective attention was very peculiar: the differences were 
precisely in the wrong direction. The 0£ coefficients were +0.12 for left-ear 
attention, +Q;.03 for right-ear attention, and +0.07 for no-attention. The 
effect was very similar for two-fonnant and three-f ormant stimuli. However, no 
individual subject showed any clear evidence of consistent positive or negative 
selective attention effects, so that the inverted pattern may have been due to 
chance. Two objects showed an inversion of the REA as a functioi^ of selective 
attention but regardless of the ear attended, to. * ' 

The frequency of psychoacoustic fusions was low (12 p'ercent) , as expected 
with acoustically dissimilar stimuli. This percentage excludes the data of BHR - 
who, as in Experiment I, showed a much higher frequency (35 percent).. Quite 
surprisingly, and contrary to BHR's control results in Experiment I, psycho- 
acoustic fusions were more frequent with three-f ormant than with two-formant 
stimuli (15 vs. 8 percent for the seven subjects; 41 vs. 28 percent for BHR). 

There was a reliable difference in the stimulus dominance pattern between 
two-formant and three-f ormant syllables, which *is shoxm in Table 7 and may be 
characterized as a reduction in the "strength" of /das/ when the 'third formant 
was removed. This was already evident in the identification of binaural pairs: 
the two-formant /dae/ received only 86 percent correct responses, while the 
thr^e-formant /dae/ received 94 percent. (The int,elligibility of the other stim- 
uli did not ch*ange. ) Table 7 shpWs that, with three formants present^ ,/dae/ 
dominated /bae/ and /g^/* With two formants, the pattern was reversed. This 
indicates that an F3 transition was more ipipojrtant for /das/ than for /bae/ and 
/gae/; and it supports the hypothesis, set forth earlier, that a poor representa- 
tive of a category will be dominated by better examples of .other categories. 
Again, iiowever, there were large individual differences in dominance patterns. 

The 0g coefficients for the three "Individual stimulus pairs (which were 
similar for two- and three-f ormant stimuli) are also shox^ in Table 7. Surpr;Ls- 
ingly, /dae + gae/ pairs did not exhibit, an average REA. BHR (who participated^n 
four sessions) even showed a LEA with this pair, but a clear REA with the other 
two. However, *^]^rt from BiIr's data, this phenomenon was not reliable , for in- 
dividual subject's who showed large variations in their ear advantages for indi- 
vidual pairs.. Both the /dae + gae/ anomaly and the high variability are somewhat 
discoi;icerting. It will be recalled that Experiment I did not show any compa.r"- 
able effect. 



This was the subject Hho claimed to have a 5-dB hearing loss in the fight ear. 
However, it would be quite surprising if this had been the cause of the 
dichotic asymmetry, considering that channel differences much larger than 5 dB 
* have only little effect ^n the dichotic ear advantage at the intensities use4 . 
here (Speaks and Bissonette, 1975). 

4,16 




TABLE 7:* Stimulus dominance indices for individual stimulus pairs, and ear dom- 
inance indices (averaged over two- and three-f ormant stimuli).^ (Note: 
A positive 0^ index 'indicates dominance of the stimulus named first.) 



/bae+dae/ /dae + gae/ /bae + gae/ 

, three-f ormarit 0j) -0.31 +0.40 +0.45 

two-f ormant 0d » +0.09 -0.31 +0.32 

average 0e +0.14 -0.01 +0.11 



GENERAL DISCUSSION; II MEASURING THE EAR ADVANTAGE 

The presence of a significant average REA for dichotic fusions is evidence ' 
that, despite the subjective impression of a single syllable, the information 
from the two ears remains functionally separated until it converges upon the 
dominant hemisphere. It makes unlikely a low- level auditory mixing mechanism 
that combines spectrally similar information and routes it to both hemispheres, 
because such a mechanism would have to be influenced by hemispheric dominance. 
Rather, it seems that each stimulus first arrives at the ^zontralateral hemi- 
sphere, and integration takes place only when the information is recombined 
after considerable auditory (and perhaps even initial phonetic) processing in 
each hemisphigre, which has been a common assumption in dichotic listening re- - 
search (Stuidert-Kennedy and Shankweiler, 1970). The REA for dichotic fusions 
challenges an interpretation in terms o'f spatial location only (Morais and 
Beitelson, 1973; Morais 1975) . S^nce only a single^stimulus is heard that is 
localized in the median plane, the hypothesis that stimuli that come from the 
rightv are perceived more accurately does not apply. 

The subjective ^hfinomenon^of fusion Chearing only a single stimulus) prob- 
ably does arise from a low-level cross-correlational mechanism, but it is 
apparently separate from, and unrelated to, the subsequent allocation and inte- 
gration of information. This has two interesting implications: (1) in the 
limiting case, identical binaural stimuli may also be independently transmitted 
to theit respective contralateral hemispheres and perceptually combined only at 
a central level; and, more importantly, (2) the identification of less complete- 
ly fused dichotic stimuli (e.g^, voicing contrasts) should be explainable by the 
same principles as the identification of dichotic fusions, for example, by the 
prototype model proposed earlier. This view is in basic agreement with the con- 
clusioas of Halwes (1969); who found that subjective fusion versus nonfusion was 
largely irrelevant to the pattern of responses. 

It also follows from these conclusions that other types of dichotic con- 
trasts should lend themselves to the one-response, no-attention requirement 
("l^at does it sound like?**) whose advantages over the two-response paradigm 
have already been outlined in the Introduction (cf. Geffner and Dorman, in press, 
who used this method successfully with four-yearrold children). However, what 
makes voiced place contrasts especially .convenient from a tnethodological stand- 
point is (1) that the task is "natural'' because the listeners are not aware of 
different inputs to the two ears, (2) that the fused stimuli do not sound ^ 
strange (as other dichotic contrasts often do) but similar to binaural syllables, 
(3) that they do not invite selective attention strategies (however ineffective ^ 

' ' 117 



ERIC 



I2i 



they may be) , and (A) that relatively few responses are given that kre Ambiguous 
with respect to ear dominance (psychoacoustic fusions). The last problem can be 
completely eliminated by simply omitting /bae + gae/ pairs from dichotic tests. A 
dichotic test composed only of /bae + gae/ and /dae + gae/ pairs, interspersed with" 
binaural controls, should be a useful instrument to try out. 

However, such a test still presents some major problems. Foremost among 
these is the phenomenon of stimulus dominance and the large individual variations 
connected with it. Extreme dominance of one stimulus in a pair must be pre- 
vented; otherwise, this dichotic pair will provide no information about ear 
dominance. Then, there is, the important question of the relationship between 
stimulus dominance and ear dominance that* parallels, but is not identical with, 
the question' of the relationship between performance level and ear dominance in 
the two-response paradigm (Kuhn, 1973). 10 Finally, there is the question of 
item homogeneity: Do different dichotic pairs measure the ear advantage to the 
same degree, even if they have equal stimulus dominance coefficients? 

Unlike performance level in the* two-rfespoitse paradigm, which is a global 
index and cannot be manipulated by the experimenter, stimulus dominance is a 
characteristic of individual stimulus pairs and can be controlled to a certain 
degree* by manipulating stimulus parameters, as demonstrated in Experiment I. 
There are two possible ways of making use of this control. One is. to try to 
minimize stimulus dominance and to bring all stimulus pairs as close to equilib- 
rium (0D = 0) as possible. Because of individual differences, construction of a 
single optimal test is out of the question. An appropriate method would be 
^ testing under computer control, where, during an initial adaptive phase of 
^testing, the computer keeps track of the responses and adjusts the stimulus 
parameters to reduce asymmetries. Such a procedure is worth exploring but has ^ j 
some drawbacks: it does not guard against drifts of stimulus dominance during / 
the actual testing ghase, and it requires sophisticated equipment and, theref^^jr^ 
is of little value orutside the laboratory. The other alternative is to con-^"^ 
struct a test' containing a varietyv of Stimuli, sp th^t the individual pairs span 
a wide range of stimulus d^ominance relationships (as in Experiment I). In order 
to derive a valid measure of ear dominance, in this case, the nature of the re-^ 
lationship between stimulus dominance and ear dominance must be known. Since it 
is reasonable to expect that ear dominance will be maximal when stimulus domi---^ 
nance is minimal, a global 0£ inde^j obtained from suram^4 response frequencies 
(as in Exp^iment I) or from averaged ear dominance coefficients for individual 
pairs will underestimate the "true" ear advantage and will not b^ comparable 
from individual to individual, because of different individual stimulus domi- > 
nance patterns. A method for inferring the true ear advantage is needed. 

- The situation is formally anailogous to that in signal detection. Ear 
dominance represent^ "sensitivity" and stimulus dominance represents "bias." 
When there is extreme bias (0d - ±1), sensitivity cannot be determined (0e = 0) • 
When sensitivity is optimal (0^ = ±1) , there cannot be any bias (0d = 0) . 



ERIC 



The question of performance level also arises in the present paradigm, in th^ 
^orm of confusions. As long as the confusions are not too numerous, however, 
their impact is negligible because of the weighting procedure employed 
[Eq. (1)]. There are some individuals, howeyer, who seem to be unable to 
give consistent identification responses to the synthetic syllables used 
here. - 

118 - 



122. 



Between these ^xtremes, the two tendencies mutually constrain each other^ For 
example, wheti/T/i)/N « 0.8 (0d " +0>j75) , it can easily be shown that Tjyg/N is 
restricted the range b^ween and* 0.7 C<2^£ be^een ±0*5^); and 0£ constrains 

in asl^lar fashion. In ordterto apply th^ methods of signal detection 
theory^'^one eve^nt (for example, responding i when i-j is presented, with i in 
thVxi^t^ear) may be arbitrarily chosen ta represent "hits," and another event 
(respondSi^i when J-i is presented, with i in the left ear), "false alanas," 
However, /the crucial requirement*. is thatf^sensitivity (namely, the "true" ea^ 
•advantaf^) be independent of the bias (stimulus dominance). Since stimulus 
domtflance ie varied by changing t^e chara^ristics of the .stimuli (rather than 
b^T manipulating the listeners' cr,iteria), iff is an important empirical question 
^hether all iteife are homogeneous (in the test-theoretical sense) and me.asure 
the same kind, of ear advantage,' so that all stimulus pairs'^'can be represented 
as points on the same single receiver-operating-qharacter^gtic function. 

' The results o£ the present experiments create ^me doubts about whether the 
homogeneity assumption will be tenable. When plotted as "hits^' versus "false 
alarms," the stimulus pairs of Experiment I exhibited considerable scatter, , 
perhaps owing to the high individual variability in thB d^a. There was also a 

-tendency for 0£ to increase with the acoustic dissimilarity of the component 
stimuli in a ^ichotic pair. At the same time, there was no negativ e correlation 

I between 0^ I^dI (£ * +0.04), so that an increase in 0^ could not be ex- 
plained by a jsimultaneous decrease of dominance asymmetries .V In Experiment IV, 
one of the three stimulus pairs showed no REA. Again, this was not related to 
stimulus dominance (cf. Table. 7). As a result, no monotonfic receiyer-pperating 
characteristic function will, fit these data well. Further research will be Re- 
quired to determine the reliability of the present findings. It may be useful 
to compare variations in stimulus dominance produced by varying stimulus param- 
eters with similar variations introduced by dther means, such as adaptation - 
(Cooper, 1974; Miller, 1975). ' . 

A more explicit model of dichotic interaction would also contribut(p to the 
solutix)n of this methodological problem. In mathematical terms, stimulus domi- 
nance (bias) and ear dominance (sensitivity) Mutually constrain each other. 
However, in the actual processing chain, the constraint may well be unidirec- 
tional, ' since it is highly likely that. the two asymmetries arise at different • 
stages in processing. Since stimulus dominance effects were more pronounced 
than^eaf dominance effects but did not correlate with the latter, the present 
* data suggest that the cause of ear dominance precedes' the cause of stimulus 
d6minance in the processing hierarchy. This is in agreement with the hypothesis 
that ascribes ear dominance to transcallosal transmission loss but stimulus 
dominance to subsequent integration of information in the dominant hemisphere. 

' REFERENCES * • 

\Blumstein, S. and^W, Cooper. (1972) Identification versus discrimination of 
/ diistinctive features '^in speech perception. Quart. J. Exp, Psychol. 24, 
207-214. , ^ / 

Cooper, W. E. (1974)' Adaptation of phonetic feature analyzers for place of 

articulation. J. Acoust. Soc. 56 , 617-627. 
Cooper, W. E. and R. M. Nager. (1975) • Perceptuo-mofor adaptation to speech:* - 
An analysis of bisyllabic utterances and a neural model. J. Acoust. Soc. 
Am. 58, 256-265. 

119 



Cutting, J: E. (1972) A preliminary report on six fusions in auditory research. 

Haskins Laboratories Status Report on Speech Research SR-31/32 , 93-107. 
Cutting, J. E. (19^7^) Auditory and linguistic processes in speech .percept;ion: ^ 

inferences from six fusions in dichotio listening. Pgycholr^Rev. 83 , 114- 

140. [Also in Haskins Laboratories Status Report on Speech Research SR-44 

(1975), 37-73.] 

Geffner„D. S. ^nd M. F. Dorman. (in press) Hemispheric specialization for 
speech perception in four-year-old children from low and middle socioeco- 
nomic classes. Cortex . [Also in Haskins Laboratories Status Report on 
Speech Research SR-42/43 (1975),* 241-245.] * , . 

Haggard, M. (i975) The terrible, truth about the masking of monosyllables. 

Speech Perception, Report on Speech Research in Progress (Psychology De- 
partment, The Queen's University of Belfast) Series 2 , no. "4, 21-30. 
Halwes, T. G. (1969) Effects of dichatic fusion on the perception of speech. 

, Unpublished Ph.D. dissertation, University of Minnesota. 
Harris, K. S.-, H. S. Hoffman, A. M. Libferman, Pi C; Delattre, and F. S'. Cooper. 
(1958) Effect of third-fprmant transitions on the perception. of the 
voijced stop consonants. J. Acoust. Soc. Am. 30 , 122-126. 
Hoffman, H. S. (1958) Study of some cues in the perception of the voiced stop 

consonants., J. Acoust. Soc. -Am. 30 , 1035-1041. ' , . 

Kuhn, G. M. (1973) The phi coefficient *as an index of ear differences in 

dichptic listening. Cortex 9, 447-45^7. 
miler,t J. L. (197S) Properties of feature detectors for. speech: Evidence 
" from the effects of selective .adaptation on dichptic listening. I^ercept. 
Tsychophys. 18, 389-397. ' 
Morals, J. (1975) The effects of ventriloquism* on the, right-side advantage 

^ for verbal material. Cognition 3, 127-139. 
Morals*, J. and P. Bertelson. (1973) Laterality effects in diatic listening. 

Perception ,2, 107-111. 
Morton, J. (196^) Interaction of information in \^ord recognition. Psychol. 

\ Rev: 76, 165-^178. 
Mor^ton, J. and D. E. Broadbent. (1967) Passive versus active recognitibn^ 
, -models, or is yOur homunculus really necessary? In Models for the Pdrcep- 
^ tion of Speech^land Visual Form , *ed. by W. Wathen-Dunn. (Cambridge, Mass.:> 
• • MIT Press). * . . 

Pisoni, Tu B. (1971) On the nature of categorical perception of speech sounds. 
UnpXiblished Ph.D. dissertation. University of • Michigan, 

Pisoni, D,. B. (1975) Dichotic listening and processing phonetic features. In 
Cognitive Theory; Volume 1 ,'ed. "by F. Restle, R* rf.|shijEf rin, N. J. £ . 
Castellan, H. Lindmkn, and D. B. Pisoni. (Hillsdale| N. J.: Lawrence' t , 
' Erlbaum Assoc.) . . , ^ | ^ i^f 

Pisoni, D. B. .and S. D. McNabb. (1974) Dichotic interacjllons' of speech' sounds 
. and phonetic feature processing. Brain Lang! 1, 35lJ-362.>, f 

Posner, .M. li4.\](1569) AbstraSction and the process of recognition./ In The 

Psychology of Learning and Motivation. Advances in^Research and Theory; 
Volume 3 , ed. by G. H. Bower and J. T. Spence. (Nbw York: Academic);., 

R,epp', B. H.* (1973) Dichotic foryard and backward maskin^^jE CV syllables. 
Unpublished Ph.D. dissertation, University of Chipagp. ^ ^ 

Repp, B. H. (1975a) Dichotic forward and backward "masking" boftween CV sylla- 
bles. J. Acoust. Soc. Am. "57 ,. 483-496. - x V ' 

Repp, B. H. (1975b) Distinctive features, dichotic competirttCon, and the encod- 
ing of stop consonants'. Percept. Psychophys. 17 , 231-240: 

120 



Repp, B. H. (1976) Effects of fundamental frequency contrast on identification 

and discrimination of dichotic CV syllables at various temporal delays. 

Mem> Cog. 4, 75-90. , , J 

RoSGh, E. (1975) The nature of mental codes for color categories. J. Exp* 

Psychol/;. Human Perception and Performance . 1, 303-322. 
Shankweiler, D. and K. Studdert-Kennedy . (1967a) Identification of consonants 

and vowels presented to left and right ears. Quart.' J. Exp. .Psychol. 19 , 
X . 59-63. 

Shankweiler, D. and M. Studd^t-Kennedy. (1967b). An analysis of perceptual 
confusions in"" identification of dichotically presented CVC syllables. - 
Haskins Laboratories Status Report on Speech Research SR-10 , 63-73. 

Shankweileij, D. and M. Studdert-Kennedy. (1975) A continuum of lateralization 
for speech perception? Brain Lang. 2, 212-225. 

Speaks, C. and L. Bissonette. (1975) Interaural-intensiye differences and 
dichotic listening. J. Acoust. Soc. Am. 58 , 893-898.. 

Studdert-Kennedy, M. C1975) Two questions. Brain Lang. 2, 123-130. 

Studdert-Kennedy, M. (in press) /Speech perception. In Contemporary Issues in 
Experimental Phonetics , ed. by N. J. Lass. . (New York: Academic). 

Studdert-Kennedy, M. and D. Shankweii^er. (1970) Hemispheric specialization for 
speech perception. J. Acoust. Soc. Am. 48 , 579-594. • 



121' 



Discrimination of Dichotic Fusions 



Bruno Repp* 



ABSTRACT 

Y ' The discriminability of dichotic fusions (dichotic voiced stop- 
. ►cpnsonant-plus-vowel syllables from the "place continuum" /beeZ-Zdae/- 
Z^Z) was assessed in an AXB paradigm by presenting stimuli composed 
of a variable stimulus in one ear and a, constant stimulus (either 
ZbgeZ or Z&eZ) in the other ear. In a control condition, the variab^^e 
, stimuli were presented without the constant stimulus. On the "cate- 
gorical-perception" assumption that syllables are discriminated only 
as well as their labels, dichotic discrimination performance was pre- 
dicted-. to be poor and wj[.thout the typical peaks and troughs observed' 
I in single-channel discrimination. However, the obtained discrimina- 
) tion fur\ctions showed basically the same peaks and troughs as the 
f sin^la-channel functibns, regardless of the nature of the constant 
stimulus; only perfprmance was lower. A^ second experiment, employing 
three variants of the dichotic discrimination task, ruled out selec- 
tive attention to one channel as an explanation. The results strong- 
,ly suggest vthat the discrimination of speech sounds is not based on 
their phonetic labels but on lower^level codes whose discrete ele- 
ments represent the proximities of the stimuli to^everair fixed "pro- 
totypes" ("multicategorical vectors")^ Dichotic integration is 
assumed to precede discrimination and to consist of a weighted aver- 
agiiig of the ipulticategorical vectors of the component stimuli. (The 
weights represent ear dominance effects, which tended^to favor the " 
right ear but wera not very consistent in the present discrimination 
tasks.) » r * , 

^ ^ INTRODUCTION - ^</' 

Many recent studies of d&ho tic listening have employed as stimuli!, the six 
stop consonants, followed by a "constant vowel (e.g., ZbaZ, ZdaZ, ZgaZ, ZpaZ> 
ZtaZ, ZkaZ) . It has been knowp'for some time* that some of the dichotic^ con- 
trasts made up from these stinjiuli fend to fuse and sound like a single ^^'yllable 



*Also University of Con^iecticut Health Center, Farmington. 

Acknowledgment : This research would not have been possible without the gener- 
ous hospitality of Haskins Laboratories and its 'director, Alvin Liberman. I ai 
deeply grateful to A. M. Libern^an for his helpful comments and his interest in 
this research. The author* was supported by NIH grant T22 DE00202 to the 
Sniversity of Connecticut Health Center. 

[HASKINS LABORATORIES: S,tatus' Report on Speech Research SR-45/46 (1976)] 
* ' . - • 

126 ' . 



(Halwes, 1969; Repp, 1976a; Cutting, 1976). In a recent paper, Repp (1976b) 
demonstrated that dichotic pairs of precisely aligned synthetic syllables dif- 
fering only in the initial fonnant transitions (/bae/, /da/, /gae/) are virtually 
indistinguishable from binaural syllables; in other words, they fuse perfectly 
and sound like single (binaural) syllables. Repp (1976b) presented detailed 
identification data for such dichotic fusions, and he also showed thart the 
characteristic right-ear advantage is obtained with these stimuli and that pay- 
ing selective attention^to one ear has little effect on the responses. 

, The present studies investigated the discrimination of these dichotic fu- 
sions. The principal question was: Is the perception of dichotic fusions cat- 
egorical? It is well-known that single syllables that differ only in their for 
mant transitions (i.e., syllables from a "place continuum") are discriminated 
very poorly as long as they fall within the same category (Liberman, Harris, 
Hoffman, and Griffith, 1957; Eimas, 1963; Pisoni, 1971), Discrimination per- 
formance can be fairly accurately predicted from knowledge* of the labeling func 
tion, assumifig that discrimination relies solely on the phonetic categories 
assigned to the stimuli. Since single syllables are perceived in this categor- 
ical fashion, and since dichotic fusions sound like single syllables, it was 
only reasonable to expect that their perception would likewise be categorical, 
so that discrimination performance could be accurately predicted f^pm identifi- 
cation data for the same fusions. However, the possibility remained that, de- 
spite srubjective fusipn, information from the individual channels might be 
accessible to some^ degree in a discrimination task; in this carse, performance 
should be better than predicted. 

The task selected was termed "onereat discrimination." It required the . 
listener to discriminate' between two dichotic fusions that differed only in* the 
component presented to one ear (the variable stimililus') but not in the other com 
ponent (the constant stimulus — cf. Figure 2). Of course, the subjects were not 
aware of the separate components but heard only single, fused syllables. In- a 
control condition, the variable stimuli were presented by themselves, without 
the constant stimulus. By comparing the results of this single-channel control 
with those of dichotic one-ear discrimination, the effect of the constant stimu 
li could be ascertained. "Categorical-perception""predictions were derived 
from the identification data in Repp (1976b). 

A secondary question concerned the dichotic right-ear advantage (REA).* 
Since the variable stimulus c<iul4 occur either in the left or the right ear, . 
one-ear discrimination perforiliance^was expected to be higher when it was. in the 
right ear. The magnitude of ^he REA could actually be predicted from the iden- 
tification data, and the one-4ar discrimination task seemed interesting as a 
possible alternative to identification taslcsyin assessing ear advantages- On 
the other hand, the REA might .turn out to ."be either larger or smaller than pre- 
dicted. The first outcome would suggest that the discrimination task is, a' more 
sensitive ind^.cator of ear asymmetries than the identification task, while the 
second outcome wQuld suggest that the listeners base their discriminations on 
stimulus codes that are less lateralized or bilaterally represented. Both out- 
comes would be in disagreement with the assumptions of categorical perception. 

Before the experiments are discussed, two-remarks on methodology are *in 
order* * 

124 ' ' . ' 



127 



An AXB discrimination paradigm was-used in ^11 the present studies: three 
successive stimuli were presented, and, the listener had to decide whether the 
second stimulus was equal to the first (AAB or same-different configuration) or 
to the third (ABB or different-same configuration). This paradigm has been 
rarely used in the past, although it seems to combine^ the advantages of the more 
popular ABX and 4IAX paradigms (Pisoni, 1971; Pisoni and Lazarus, 1974). Pisoni 
has demonstrated that the 4lAX paradigm, which consists in judging which of two 
successive pairs of stimuli contain' a difference, leads to higher performance 
than the ABX para4igm, presumably because of the posj^ibility of a "secont^order" 
comparison between subjective differences. The AXB plhradigm also allows such 
second-order comparisons (of the A-X difference with the X-B difference), since 
tjie two identical stimuli never straddle the odd one (as is the case in the ABA 
configuration of the ABX paradigm). Thus, AXB ulay well be as sensitive as 4lAX, 
but It is as economical as ABX, since only three stimuli are presented in a 
trial. • ' * - ' 

In the tasks described here, it is' important that the dichotic syllables 
are exactly simultaneous. Even very small asynchronies ipay lead to ch'anges in 
the subjective location of successive fused stimuli (henceforth referred t.o as 
"location shifts") which will aid discrimination and confound the results. 
Cherry and Sayers (1956) and, more recently. Young, Parker, and Carhart (1975) 
have shown that the discrimination threshold for temporal "asynchronies between 
binaural speech sounds is as low as 0.02 to 0.03 msec, which sets. the upper 
limit for the permissible error in the present studies. This precision is not 
achieved by standard procedures for recording difchotic tai?^, a fact that was 
fully realized only after the present experiments (Experiments lA and IIA) ,had 
been conducted. Therefore, both experiments wefre replicated after a procedure 
for more precise syllable alignment had been devised, and the original studies 
will be described together with their replications (Experiments IB and IIB) . 
With the 'exception of one part of Experiment IIA, which^^howed evidence of arti- 
facts, the replications confirmed the original data. 

EXPERIMENTS lA AND IB 

Method * 

Subjects . There were seven subjects in Experiment lA and seven different 
subjects in the replication. Experiment IB. All were paid volunteers, right- 
handed,^ and relatively inexperien^ced listeners* The subjects of Experiment lA 
had .previously p.articipated in an identification task using the same stimuli 
(Repp, 1976b: Experiment I). The data of one additional subject in each^ study 
'were excluded becaus^ they were at chance level. 

Stimuli . The/'stimuli were seven syllables from a "place continuum" (Pisoni, 
1971), ranging perceptually from VtJae/ to /dae/ to /gae/. They were produced on 
the Raskins Laboratories parallel resonance synthesizer. All syllables had the 
same duration (280 msec), a constant fundamental frequency (114 Hz), a voice- ' 
onset time (VOT) of -15 msec (i.e., prevoicing) , 45-^msec linear transitions, and 
no burstdi but an abrupt onset of energy following the prevoicing. They differed 
only, in the onset frequencies of the second-f ormant (F2) and third-formarit (F3) 
transitions, which are* -shown in Table 1. . ^ 



125 



1 



TABLE 1: Starting frequencies (in Hz) of second- formant (F2) and third-formant 
(F3) transitions of the seven stimuli* 



Stimulus No. 


F2 


F3 


1 


1312 


2348 


. 2 


■ 145,6 


2694 


3 * 


. 1620 


3026 


4 


1772 


3026 


5 


1920 


2694 


. 6 • 


207|8 


2348 


7 


2234 


2018 


M 


1620 


2862 



Dichotic pairs were' constructed using the pulse code modulation (PCM) sys- 
tiem at Raskins Laboratories. This procedure involved digital sampling of the 
synthesizer 'output with a standard sampling rate of 8,000/sec in Experiment lA, 
resulting in a random sampling error not exceeding 0.125 msec, which remained 
fixed for each individual stimulus. In addition, 'because the smallest accessible 
unit was two samples, the onset- of a srtimulus could be in an even or in an odd 
sample, so that the onsets of two dichotic syllables could be off by ±0.125 
msec. (This was probably the more important factor.) In Experiment IB, all 
syllables, were redigitized until they all started in an odd sample, which eJLim- 
inated the onset asynchronies. In addition, a faster sampling rate was used 
(20,000/sec) , which reduced the random error to b^low 0.05 msec. Furthermore, a 
magnified section of the steady-state vowel of e^^ch stimulus was displayed on a 
storage oscilloscope and dompared to a standard waveform selected from one of 
the stimuli. Poor matches were rejected, an<J the stimuli were redigitized until 
their waveforms matched the standard quite wel^. This procedure reduced the 
random error to at least half its magnitude an^, thus,, below the detection 
threshold for "location shifts."! ' . ^ 

In Experiment IB, the iEollowing characteristics of the stimuli were inad- 
vertently changed: overalT duration was reduced to 196 msec, prevoicing to 10 
msec, and transition duration to 38 msec. 

The experimental tape of Experiment lA contained first a random series of 
44 AXB triads of single sylldbles (i.e., in one channel only). * Only the six 
"one-ste|x" (1 vs. 2, 2 vs. 3, 'etc.) and the five "two-step" (1 vs. 3, 2 vs. 4, 
etc.) discriminations were included, in each of the four possible AXB configura- 
tions (AAJU ABB», BBA, BAA) . This was followed by a series of 88 dichojtic triads 
in which the same (variable) stimuli in oiie ear were combined with the' constant ' 
stimulus 1 (/bae/) in the other ear. The variable stimulus could occur either in' 
the left or the rigl;itt ear. Another similar series of 88 dichotic triads followed 
in which the constant stimulus was 7 (/gae/). Finally, there was another series 
of 44 single- channel triads. The interstimulus inteifval was 1 sec and the inter- 
triad interval 3 sec. . 



It should be -noted that neither the author nor any of the other subjects re- 
ported, any location shifts in Experiment lA* Nevertheless, the replication 
seemed an , appropriate cautionary measure* 

126 . ' 



ERIC 



The experimental tape of Experiment IB differed frojn that of Experiment lA 
in that the two constant stimuli, 1 and 7, were not blocked but randomized^ so • 
that the dichotic triads constituted a single blpck of 176 trials. 

Procedure . The subjects were .tested in small groups, usually joined by^tjie^ 
experimenter, in a single session las.ting approximate^ly 90 minutes. Playback 
was from an Ampex AB-500 tape recorder through an amplifier to Telephonies' 
TDH-39 earphones. Playback intensity was adjusted and monitored on ^.Hewlett- 
Packard voltmeter, and special care was taken to equalize the intensities of the 
two channels at about 85 dB SPL (peak deflections, for individual syllables) , 
which was the intensity used in the earlier, identification study. ' * 

. Each^^^^ubject listened to the experimental tap^' twice. The eatphoile channels 
were i^&changed electronicailly before the second run. The single-channel 
trials were presented binaurally in Experiment lA but monaurally ^in Experiment 
IB. The AXB paradigm was explained in detail: X vas xlesdribed as a variable 
stimulus that could be equal either to A or to B, the latter two always .being dif- 
ferent from each other. Correspondingly, -the subjects were ?isked^ to writer down 
A or B as their responses and to gueste when uncertain. The fwo configura- 
tions A = X?fcB (same-different) and Af*X = B (-different-same), werve ppinted out 
and appeared in this form as a reniinder on the ^swer sheets. This was intended 
to guide the subjects to a processing stmfegy similar to that in a 4IAX para- 
digm. The subjects were not informed abjwt the dichotic nature of the stimuli 
until after the experiment. 

In summary. Experiment IB differed from Experiment M by (1) more precise 
stimulus synchronization, (2) shorter stimulus and transition durations > (3) 
monaural instead of binaural presentation of single-channel trials, and (4) ran- 
dom instead of bloclced sequences of the two constant stimuli in dichotic. trials. 
However, none of these changes was expectedf to have any- great effect, and 
Experiments lA and IB were expected *to" agree in their main results. 

Results * • ^ , • 

Single-channel discrimination . The average^ single-channel discrimination 
performance in the two experiments is shown in the left-hand panels of Figure 1. 
The upper panel also shows the functions predicted from the identification data 
(Repp, 1976b: Experiment I), assuming perfect categorical perception a'nd 
absence of sequential, effects. The prediction formula is the same as in the 
ABX and 4l^ paradi^ (Pollack and Pisoni, 19171). 

The discrimination functions show the characteristic peaks and troughs of ^ 
categorical perception (Pisoni, 1971). They are mpre pronounced in Experiment 
IB than iai Experiment lA, indicating that the' Syllables in the" replication study 
were labeled more consistently. There are some deviations from the predictions 
in Experiment lA. Most' of these can be explained by a shift in the labeling of 
stimulus' 5 toward G, relative to the earlier' l|dentification experiment. (There, 
stimulus 5 had been the only truly ambiguou^ syllable, and <Lt had received some- 
what more D than G responses.) The functions In the lower panel Indicate that 
stimulus 5 was consistently labeled G in Experiment IB. One deviation from the 
predictions that" cannot be explained by a shift in labeling responses, is the 
better-than-chance discrimination of 1 from 2; both syllables vere perceived as 
B in the identification study, and other features of the discrimination datQ^ 

127 

130 " ' • / 



1 



EXP.IA 



EXP.IB 




ii 

C CO 

cd u 
a 

I 0) 

C CO 
(0 



C O 

O (0 
U 

O 0) 

§^^- 

(U 

C<f • CO 

o pQ a 

•H M •H 

CO tH 

ecu 

•H <0 O 

e ^ w 
•H <: •H 

o * 
(0 (0 (U 



4J M CO 
(0 <U 

§ S*" 

4J CO 

tM (0 

o w 

CO (0 O 

C (0 

O ^ 

0) CO 

(0 

0) C H 

cog 

O O 0 



(U M 
— 0~<O 
>t4 (U 0) 



CO 

^•8 



o 



c 

CO -H *J 



(U 



4J (U 
o ^ 



I 



i03aaoD iN3Da3d 



O -H 

M 5 2 



0) 



128 



Figure 1 



ERIC 



. 131 



suggest the same for the present studies. This seems to be an Instance of true 
within-category discriminability. On the whole, however, the data are in good 
agreement with the categorical-perception assumption that the stimuli were dis- 
criminated little better than by their labels alone. . , * 

One-ear discrimination . The middle and right-hand panels of Figure 1 illus- 
trate the predicted and obtained one-ear discr^-mination results with 1 and 7, 
respectively, as constant stimuli. It is obvious that the obtained discrimina- 
tion functions diverged dramatically from the predicted functions. Performance 
was expected to be quite poor, especially in the +1 condition (due to stronger 
perceptual dominance of 1 than 7 in dichotic competition, as observed in the 
identification study), and no pronounced peaks and troughs in the discriminatioti 
functions were predicted (owing to generally inconsistent labeling of dichotic 
fusions). The obtained ^functions, on the other hand, did show clear peaks and 
troughs, and performance 'was generally much higher than Qxpected. 

This was equally true for both experiments, w)>ich demonstrates that the re- 
sults in Experimerxjt lA were not due to ^artif actual "location shifts^," (In any 
case, no such shifts were heard during the experiment.) An analysis of variance 
showed no significant difference in overall performance level between th§ tWo 
experiments, nor was there any difference between the overall effects of the two 
constant stimuli.^ Those differences that did exist between the two sets of* 
data were probably due to intersubject variability, blocked versus random ^con- 
stant stimuli, and perhaps the changes in acoustic stimulus structure. ^ 

A comparison of the dichotic with the single-channel discrimination func- 
tions in Figure 1 shows that performance was lower in- the dichotic condition but 
tliat the pattern remained basi^&ally the same. Despite considerable variation in 
detail, the location of the major peaks and troughs did not change as a function: 
of the constant stimulus in^ the other ear.^ The only shift may be seen in the 
+7 condition of Experiment IB, where the first valley of the two-step function 
has shifted to the left, that is, away from the constant stimulus. 

♦ * 

Ear advantages . The seven subjects in Experiment lA showed only a very 
small and nonsignificant average REA (0 = +0.02^ cf. Kuhn, 1973). In. the +7 
condition, there was actually a small left-ear advantage (LEA), while, ia the 
+1 condition, there was a somewhat larger REA (0 = +0.06; /x (1) = 3,8; £ = .05). 
Although the same seven subjects had shown an average REA of 0e +P«08 in the 



The relatively poor two-step discrimination in the +i condition of ^Experiment 
IB was not tested for significance and remains unexplained. . '^y^ \ 

Kot^ tkat thi;S is the opposite effect from that of adaptation wher^ a migration 

pedlcs and yallerys toward the adapting stimulus occurs (Cooper, 1974). Ap- 
parently, little adaptation took place in the blocked conditions of Experiment^ 
IA. (The po^.sibility of such adaptation effects had prompted the f^domizatioh 
of constant stimuli in tlie replication study.) The author participated as an 
additional subject in four sessions of each experiment. His data ^generally con- 
fir^ied the results of the less experienced listeners, except that W showed more 
pronounced migrations of the discrimihation peaks away from the constant stimu- 
lus end of the continuum,. An explanation for this deviant result will be sug- , 
gested in Footnote 8. 

129 

1 9 
1 o *i 



identification task (see Repp,. 1976b, about calculation of 0£) , the REA in dis- 
crimination was predicted to be smaller, so that the obtained average REA was 
of the expected magnitude. However, there was no agreement with the. predictions 
at a more detailed level, and there was no relationship between the ear advan- 
tages in the identification and discrimination tasks. No individual 0 coeffi- 
cient reached significance. 

In Experiment IB, there was no average ear asymmetry at all, and there was 
no difference between the +1 and +7 conditions. However, two subjects showed 
significant individual ear asymmetries (one REA and one LEA).^ 

,A joint analysis of variance of the two experiments showed no significant 
ear advantage. The triple interaction that reflects the REA in the +f condition 
of Experiment lA was only marginally significant. 

Discussion 

The results pre'sent an interesting paradox: the perception of dichotic 
fusions was both categorical and noncategorical. It was categorical because the 
discrimination functions showed peaks and troughs. At the same time, it was 
noncategorical because performance was much better than predicted from the 
identification data. It should be noted that, contrary to single-channel dis- 
crimination, no peaks and troughs were predicted for dichotic discrimination 
because of the absence of clear categories in the identification of dichotic 
fusions due t^ the relative dominance of the constant stimulus (see Repp, 1976b: . 
Figure 1). In a sense, all dichotic fusions with a given constant stimulus were 
within a single, illrdefined category; hence, the poor expected performance. 

The discrepancy ,b;etween predicted and obtained data is evidence that the 
subjects did not base their discriminations on the labels assigned to the ' 
dichotic fusions. What, then, formed the basis of their responses? One obvious 
possibility, suggested by the general coincidence of the discrimination peaks 
in the single-channel and dichotic conditions, is that the subjects had access 
to the information from the separate channels prior to its fusion and integra- 
tion. Under this hypothesis, they were simply discriminating the variable 
stimuli and ignored the constant stimuli which only had the effect of noise and*; 
led to a generally, lower level of performance. In oi:der to test this hypothesi^,^ 
two new discrimination taSks were devised that are relevant to the question of 
channel accessibility. Because of the unclear results with respect to the REA, 
it also seemed desirable to obtain further data on one-ea,r discrimination, so 
that Experiments IIA and IIB contained three different discrimination tasks. , 

. V ' ^ . ' EXPERIMEtrP'S IIA AND IIB . ' - 



The three discrimination tasks are illustrated in Figure, 2. The first 
task was one-ear discrimination, asi^ in Experiments lA and IB J The second task 
was termed "reversal discriminatijonT*^ It consisted in telling apart two dichotic 



4 

The author, who had shown a reliable REA in the identification task, exhibited 
only a small and nonsignificant REA in Experiment lA (0 = +0.02) but a much 
larger and significant effect in Experiment IB (0 ^ +0.10; x^(l) - 12.8, 
£ < .001). 

130 



133 



fusions made up of the same components, the only difference being the channel 
(ear) assignment of dSi^ese components. The third task was a combination of the 
other two and was called "crossover discrimination." It consisted in discrimin- 
ating two dichotic fusions that differed in only one component, which, howevep, 
"crbssed over" ta the opposite ear. In other words, crossover discrimination 
was a "one-ear" discrimination with an additional channel reversal. 



ONE-EAR 


REVERSAL 


CROSS -OVER 


L R 


L R 


L R ' 


1+7 


1 + 7 


1+7 


3+7 


7 + 1 


7+3 

1 



Figure 2: Three discrimination tasks with fu5ed syllables. The numbers repre- 
sent individual stimuli. 



Both new tasks address the channel-accessibility hypothesis. On the cate- 
gorical-perception assumption that it is the labels that are discriminated, per- 
formance in reversal discrimination should be close to chance. In fact, in the 
absence of any ear asyrdmetry, reversal discrimination should be impossible. 
Performance should improve in proportion to the ear advantage (regardless of its 
direction) , but since ear advantages for dichotic fusions are generally small . 
(Repp, 1976b), the expected level of accuracy remained very low. On the other 
hand, the subjects should be much more successful if they had access^ to the 
separate channels, since each channel contains a discriminable difference. 

A similar argument may be made for crossover discrimination. On categori- 
cal-perception assumptions, crossover discrimination should be, as easy (or as 
difficult) as one-ear discrimination of the same stimuli, except for small dif- 
ferences due to ear .asymmetries. However, if the listeners had access to the 
individual channels, performance should be considerably high^?: in crossover dis- 
crimination. Not only are there discriminable dif ferjences in both chanixels (as 
opposed to one channel in one-ear discrimination), but these , differences are 
also typically easier to detect than those in the variable channel of one-ear 
discrimination (cf. Figure 2). 

Method 

Subjects . There were nine subjects (one left-handed) in Experiment IIA and 
ten subjects (two left-handed) in Experiment IIB; three of these subjects took 
part in both experiments. 



13i 



Materials . The syllables were those of Table 1, with one* additional sylla- 
ble from the lower (/hse/) end of the continuum; it was called stimulus 0 and had 
transitions starting at 1155 Hz (F2) and 2018 Hz (F3) . The recording procedures 
of Experiments IIA and IIB were identical to those *of Experiments lA and IB, re- 
spectively. 

The experimental tapes contained first a series of 64 triads of single syl- 
lables, which were presented monaurally in both experiments. The series con- 
tained the four ABX configurations of each of 16 stimulus discriminations: the 
four two-step discriminations, 1 vs. 3, 2 vs. 4, 3 vs. 5\ and 4 vs. 6, and all 
discriminations of stimuli 0 and 7, respectively, from stimuli 1 through 6. 
This series was followed by a completely randomized series of 176 dichbtic 
triads^ comprising 64 one-ear trials, 64 crossover trials, and 48 reversal trials. 
The one-ear and crossover triads represented the four two-step discriminations 
with either 0 (/bae/) or 7 (/gae/) as the constant stimulus, in all possible 
and channel configurations. The reversal triads consisted of thfe dichotic com- 
binations of 0 and 7, respectively, with stimuli 1 through 6, in all AXB con- 
figurations. * /- V 

Procedure . Each subject listened to the tape twice, with a pause in between 
during which the earphone channels w6re reversed. Otherwise, the procedure was 
identical to that in the previous experiments. 

Results * ^ » 

Single-channel discrimination . The overall accuracy of monaural discrimin- 
ation was the same in the two experiments (Experiment IIA: 81.6 percent cor- 
rect; Experiment IIB: 81.3 percent correct). A more detailed breakdown of the 
results is shown in the left-hand portions of Tables 2 and 3. Obviously, 
stimuli 1 and 2 were difficult to discriminate from 0, and 5 and 6 were diffi- 
cult to discriminate from 7; these stimuli fell within the B and G categories,* 
respectively. Table 2 shows that discrimination, from 0 became relatively easier 
and discrimination from 7 became relatively more difficult in Experiment IIB, 
both within and between categories. The reason for this interaction is not 
clear. ^ 

Reversal discrimination . The revetsal discrimination results are shown in 
Table 2. In the data of Experiment II, at least three .stimulus combinations can 
be discerned for which artifactual location shifts apparently provided a valid 
cue (underlined in Table 2) , although performance did not exceed 75 percent cofv 
rect even in those pairs. However, there was surprisingly little ph^ge in 
overall accuracy from Experiment IIA to Experiment IIB; in fact, ig|rf6rmance 
improved for jsix of the stimulus combinations. This suggests that^the naive 
subjects jprbflted relatively little from location shift' cues. All Ih all, per- 
formance remained quite poor, though perhaps somewhat better than expected. ^ 



ERIC 



5 

This experiment was preceded by a pilot study of reversal discrimina^tion, 
which was beset with "location shift" artifacts. However, the inexperienced 
subjects api^arently did not profit much from this additional cue and performed 
poorly (59.1 percent correct), although somewhat betteqr than predicted from the 
identific.ation study in which these subjects had participated (53.8 percent 
correct)! The most interesting result of the. pilot study wafe the complete in- 
effectivjeness of an additional independent variable: attenuation of one channel 

132 , - 



135 



TABLE 2: Monaural and dichotic reversal discrimination in Experiments IIA and 
IIB (percentages of correct responses). (Note: The underlines indi- 
cate probable artifacts due to "location shifts.") 



Stimuli Monaural Reversals 

vs. 0 vs. 7 +0 +7 

<: 1 A4.4 93.2 51.4 75.0 

H ." 2 66.7 100.0 55.6 73.6 .. 

^ 3 83.3 95.8 59,7 55.6 

S 4 91.7 94.6 62.5 55.6 

I 5 88.9 76.4 51.4 54.2 

0) 6 91.7 48.6 72.2 58.3 

w X 77.8 85.4 58.8 62.1 

« 1 57.5 93.7 61.2 6*2.5 

H 2 _ - 71.2 83.7 % 58.7 41.2 

« 3 86.2 91.2 57.5 57.5 

1 4 96.2 91.2 67.5 66.2 
•S 5 90.0 63.7 . 60.0 48.7 

2 6- 95.0 58.7' " 58.7 51.2 

w X 82.7 80.4 60. 6 54.6 



TABLE 3: Monaural controls and dichotic one-ear and crossover discrimination in 
Experiments IIA and IIB (percentages of correct responses)^ 



"ST 



B 

U 
(U 

Ct3 









One-*ear 


Crossover 


Stimuli 


Monaural 


+0 


+7 


+0' 


+7 


1 vs. 


3 ■ 


76.4 ■ 


63.9 


59.7. 


68.7 


, 72.2 


2 vs. 


4 


79.2 


61.8 • 


66.7 


68.7 


66.0 


3 vs. 


5 


79.2 


59.0 


68.1 


66.7 


60.4 


4 vs. 


6 


91.7 


*65.3 


58.3 


73.6 


59,7 


X 




81.6 


62.5 


63.2 


69.4 


'64.6 


1 vs. 


3 


85.0 


73.7 


77.5 


81.2 


79.4 


2 vs. 


4 ■ 


78.7 


71.2 ' 


68.7 


69.4 


66.9 


3 v?. 


5 


71.2 


60.0 


76.^9 


61.2 


75.0 


4 vs. 


6 


87.. 5 


73.7 


76.2 


76.2 


. 84.4 


X 




80.6 


. 69.7 


74.8 


7p.O 


76.4 




by 10 dB (channel intensities at 85 and 75 dB) . |^Although the fused syllables ' 
were lateralized toward the ear with the louder stimulus, the percept^ual domi- 
,nance of the louder syllable Bid not increase, and performance even decreased 
slightly. This is in agreement with the results of Cullen, Thompson, Hughes, 
Berlin,, and Samson (1974) and Speaks and Bissonette (1975), wHo varied relative 
intensity in identification studies and obtained no effect in this range. 

^ 133 



In Experiment IIB, performance in reversal discrimination*, correlated moder- 
ately (r = +0.45) with the absolute size of the ear advantage in one-ear dis- 
crimination, as predicted; however, the correlation did* not reach significance. 
The variation in accuracy between different stimulus combinations followed no 
interpretable pattern. 

One ear and crossover discrimination . ^These results are shown in Table 3.. 
In Experiment IIB, 1 vs. 3 and 4 vs. 6', which were discriminated* best rionaurally, 
also showed the highest scores in one-ear and crossover discrimination, in 
agreement with Experiments lA and IB. (In Experiment IIA, -there was no clear ♦ 
pattern.) Performance improved significantly (£ < .01)' from Experiment IIA to 
Experiment IIB. This suggests that location shifts played no role in these 
tasks in Experiment IIA, t7hich agrees which subjective* evidence and the compari- 
son of Experiments I A and IB. • * * 

Crossover discrimination was slightly superior to one-ear discrimination 
(£ < .02). The effect was more, pronounced in Experiment IIA,. but there was no 
significant interaction with experiments. ' 

The^e^s^ evidence of a REA- in onq-ear discr.imination.' Eight out of ning 
subjects Jj/Experii|^nt IIA and, seven out of ten subjects In Experiment IIB 
showed a REA; one RTA in Experiment IIA and two REAs and two LEAs in ExperimeU 
IlB were significant at the individual level. The average REAs corresponded 
0 coefficients (Kuhn, 1973) of +0.07 (£ <».05) in Experiment IIA and +0.05 
(£ > .10) in Experiment IIB. In bhe^analysis of variance,* the overall REA wa^ 
only marginally significant (£ < .10). However, there was a significant inter- 
action with the nature of the constant stimulus (£ < .003). As in Experiment lA, 
the REA wa^s much larger when the constant "stimulus was /bae/ than when it was 
/gae/. In fact, a small REA In Experiment* IIA and a small LEA in Experiment IIB 
averaged out to zero in the +7 condition, while the +0 condition .showed fadrly 
large REAs in both experiments (0 = +0.10 and +0.15, Mspectively; both \ 
£ < .01). ^ 

In crossover discrimination, there was also a marginally significant over- 
all' eat asyrametty (£ < .06)*, which, however, occurred only in Experiment IIA: 
Ijkperformancie was higher when the acousftically more dissimilar stimuli were in 
the right ear. '(For example,^ in 0 + 3 vs. 5+0, performance was higher when 0 
and 5 wete in the right ear.) This ear asymmetry in Experiment IIA. corresponded 
to a 0 coefficient of +0.09 (£ < .01). ^ ^ ^ 

Discussion 

The results showed reversal discrimination to be better than expected. and 
crossover discrimination to be easier than one-ear discritDlnation. However, 
these effects were rather small and do not justify. the conclusion, that the 
listeners had access to the information in the separate channels prior to fusion. 
^ If one-ear discrimination were to be explained by the channel-accessibility 
hypothesis, reversal discrimination should havrf been considerably easier than 
one-Qar discrimination. This was clearly not the case. The hypothesis also 
contradicts the subjective Impression of perfect fusion and Repp's (1976b: 
Expieriment iV) demonstration that seleeti^i^-Attention to ofte ear Is ineffectiye, 
and.it must therefore be dismissed. % * 



The small effects found were perhaps due to variations in ear dominance 
within. subjects. It is quite conceivable that ear dominance is a rather unstable 
characteristic that exhibits fluctuations over^time. Such variations a;;otind ,a 
mean value would aid reversal and crossover discrimination, especiallyin indi- 
viduals with no^ strong ear aaymmetrtes. 

There was a 'REA in one-ear discrimination, but only whgn the constant 
stimulus was /bae/, as in Experijnent lA. This puzzling finding, together with 
the apparent unreliability of the REA and the tediousness of the task, does not 
(make these discrimination tasks a promising alternative po dicl^otic identifica- 
tion tests as instruments for assessing ear advantages. « 

' GENEtCAL .DISCUSSION ^ ' 

7 I 

\"Jhat is the nature of the stimulus representations that the subjects tried 
to discriminate? They are not the phonetic labels of the dichotic fusions, be- t 

.cause the obtained discrimination results did not conform to the predictions 
from the dichotic labeling functions (Experiments lA and IB). They are* not the 
phonetic labels of the variable stimuli albne (prior to integration and fusiOn 
with the constant stimuli in the other channel), because al^.cessibility of indi- 
vidual channels seems highly unlikely (Experiments IIA»and 'JIB) . Nor ^c an they 
be "raw'* auditory representations retained in some short-term store (Pisoni, ^ 
1971; Pisoni and Lazarus, 1974), since discriipination of-^p^-level auditory 
codes would be expected to be more or 'less contlLnuous and', could not lead -to the 
pronounced peaks and troughs in the one-ear^ discrimination functions (Experir 
ments lA and IB) . This leads to the conclusion that the codes that are discrim- 
inated must be an intermediate stage between initial auditory analysis and the 
final phonetic label, and that they most likely represent the integrated infor- ^ 

^ mation from the two ears and not a single channel. , * ) 

This intermediate stage "can be more precisely specified within the frame- 
work of certain models of speech perception that postulate a limited number pf 
discrete analyzing mechanisms that intervene betweefi the auditory input and the 
phonetic label. These mechanisms may be teamed "fedture detectors" ('E'imas and 
borbit, 1973; Cooper, 1974; Cooper ahd^Nager, 1975) or "prototypes? (Repp,, 
1976b); the distinction, while " important in other contexts, ^ need riot concern us 
.here. Let us assume, then, that there are three "ptototypes"' corresponding to 
the three categories (B, D, G) , and that each prototype exhibits maximal "sensi- 
tivity"- to the acoustic input most appropriate for the corresponding categories. 
So, for example, a stimulus from the /bae/ 'end of the place continuum will "acti- 
vate" the B prototype most and the D and G prototypes o^ily little; .the next 
stimulus on the continuum, :gtill heard as /bae/ but closer to /dae/ that) the 
first syllable, will activate the B prototype a little less and the D protptypj^ 
a little more, and so on. Such hypothetical activation' values for the ptesent 
• stimuli (Table 1) are illustrated in Table 4 ("single-channel").^ 



ERIC 



Repp, B. H. ' Dichotic competition of speech sounds: The role of acoustic stim- 
ulus structure. Unpublishea manuscript. 
\ 

The degi:ee of activation of a^prgtotype mosT likely bears a nonlinear relation- 
ship to the acoustic "^istAnce" between stimulus and prototype. The exact func- 
tion will depend on thjs "-response charactferistip" of the prototype or on the 
I "distribution of excitation" around the stimulus, .about which little is knovm at 
'the present time. 

. •• . . 135 

o • • 138 



TABLE 4: Fictitious multicategorical vectors and one-step discriinina\)ility in- 
dices In single-channel and dichotic one-ear discrimination. , 



Single-channel , " +1 ' One-ear +7 

Stimuli B D G (j;d2)l/2 , B D . G ' (^d2)l/2 B D G (Id2)l/2 

'1 8 11 8*1 1 4.5 1 4.5 . _\ 

2 7 2 1 7.5 1.5 1 . 4 1.5 4.5 - ^'l 

3 2 6 2 ■ 5 ' 3.5 1.5 , ^.9 ^5, 3 5 5 2.9 
^ • 17 2 l'\ 4.5 -4 1.5 - 14 5 H 
3 . V 1 4 5 • 4.5 2.5 3. ^'j: 1 2.5 6.5 f'^ 

.,6-127 2.8 ^^3, ^^.3 4 1.0 1 ,^5 1.0 

7 118 4.5 1 4.5 * 118 ^'^ 



This representation of the stimulus information as a vector of prototype 
activation values has been termed "multicategorical" by Repp (1976b). Tlie final 
category label is determined by a decision process that selects* from the multi- 
categorical vector the protdtype, with the highest activation level. We may 
assume that there is noise in the system, so that the decision process is 
probabilistic in'' nature. For the sake of simplicity, the taumbers in Table 4 
have been chosen to be roughly proportional to the probabilities of identifica^ 

tion responses in the respective categories (accord^.ng to .the data in Repp, 

1976b). They add up to a fixed sum for each stimulus, implying that each 
stimulus leads to the same degree of total activation in the system. 



Let us now assume that a listener bases his discrimination responses not 
*on the phonetic labels but on the multicategorical vectprs, even in single- v 
channel discrimination. An appropriate indek for^the discriminability of two 
vectors is the Euclidean distance between them (in a three-dimensional ''proton 
■type space," fn the present example), which is equal to the square root of the 
sum of squared differences between corresponding elements. This discriminability 
index '(^d^)-^/2^ is displayed for one-step discriminations in Table 4 ("single- 
channel"), hs calculated from the hypothetical multicategorical vectors. It is 
evident -that the ind^x is maximal across category boundaries and minimal within 
categories, just like the obtained (and predicted) single-channel discrimination 
functions. Therefore, the assumption that listexiers discriminate multicategori- 
cal vectors rather than phonetic labels is plausible and can, at least in 

principle, account for the categorical perception of single syllables. 
• . ■ ■ ^ 

We are now only one step removed from the explanation of the dichotic dis- 
crimination functions. In order to complete the argument, an assumption about 
the nature of dichotic interaction is necessary. Repp (1976b) has already 
argued from an analysis of the Identification -of dichotic fusions th^t dichotic 
integration of information takes place at the level of multicategorical repre- 
sentation and that the process is additive, that is, the multicategorical vector 
of a dichotic pair is the sum of the multiqategorical vectors of the dichotic 
stimuli. When applied to our present problem, this leads immediately to the 
insight that the addition of d constant vector to each of two vectors does not 
change their discriminability, because it does not change the differences be- 
tween corresponding elements and, hence, leaves the discriminability index un- - 
affected. Therefore, one-ear discrimination functidns sho\ild have the same 

136 " ' . 

' • ■ ■ ■ V. . ■ 

139 



shape as single-channel discrimination functions, regardless of the nature of 
the constant stimulus. This was in fact obtained, at least in good approxima- 
tion. However, the additivity assumption would predict no change in di^crimina- 
bility at all, whereas the obtained one-ear discrimination performance was con- 
siderably lower than single-channel performance. If we 'remember the assumption 
that the total amount of activation produced by a single syllable is constant 
and the fact that dichotic fusions sound like single syllables, it is then 
plausible that dichotic integration is not a simple suipmation but an averaging 
process that keeps the total activation constant. If each of two vectors is 
averaged with a constant vector, their relative discriminability will remain un- 
changed, but their absolute discriminability will decrease because Averaging re- 
duces all differences to half their' original size. This is illustrated in 
Table 4 ("one-ear discrimination"). , 

In order to account for ear dominance effects, we finally stipulate that 
dichotic integration consists in the weighted averaging of multicategorical 
vectors (x,. z)* weights (a, b; a+b » 1.0) represent the relative dominance 

of each ear. Our model of dichotic integration is then: ax+b^ = z. 

This relatively simple model provides a good qualitative account of the 
data.^ (A quantitative formulation is straightforward, and tests of a formal 
model are now in progress.) Note the dissociation of labeling and discrimina- 
tion responses that occurs in dichotic fusions. By adding a constant stimulus 
to stimuli from^a place continuimi, the labeling functions are strongly biased 
toward the constant stimulus (cf. Table 4, assuming that the prototype activa- 
tion^ values represent the probabilities of the corresponding responses; see also 
Repp, 1976b:Pigure 1). On the other hand, discriminability remains indepen- 
dent of the constant stimulus and simply drops in absolute level, leaving the 
pattern unchanged. The fact that single-channel discrimination functions can 
be predicted from single-channel labeling functions may be a coincidence. The 
fact that even single-channel performance is usually somew|iat better than pi^e-_ 
dieted may ^be cited as additional (weak) evidence that discrimination is based 
not on »the phonetic Tatels but on a lower-level representation. 



Those shifts in the discrimination peaks that were observed in* Experiments lA 
and IB (primarily for the author as a subject), probably do not reflect indiv- 
idual tendencies to make some discriminations on the basis of phonetic labels, 
since it seems difficult to account for any peaks from phonetic discrimination 
alone Xcf. the predicted dichotic functions in Figure 1, top). A finding from 
the earlier identification experiment is relevant heret the author showed a 
much stronger tendency toward "psychoacoustic fusions" (hearing D when /ba&Z- 
/gae/ is presented) than most other subjects. Repp (1976b) argued for an ex- 
planation of psychoacoustic fusions at the multicategorical level, but it may 
be that Cutting (1976) is right in hypothesizing a low^r-level (probabilistic) 
auditory averaging process. Such auditory averaging, ' if it occurs,, would pre- 
cede the establishment of the (single) multicategorical code, and it would de- 
stroy additivity and result in a shift of discrimination peaks. The finding 
that primarily the author showed such shifts and that they occurred especially 
in the region of /baeZ-Zga/ contrasts (cf. also Figure 1) supports this explana- 
tion. Therefore, in order to account for the detailed response pattern, a two- 
' stage model may be necessary. It seems, however, that , auditory averaging 
plays only a*'min'or role for most subjects. 

137 



The present model makes the distinction between phonetic (discrete) and 
auditory (continuous) discrimination unnecessary, at least in the present coa- 
text (Fujisaki and Kawashima, 1970; Pisoni, 1971; Pisoni and Lazarus, 1974)* 1; 
The multicategorical vector is a code consisting of several discrete elements , 
that assume continuous values, and it is therefore both discrete and continuous.^ 
More sensitive discrimination tasks will lead 'to better performance than less 
sensitive ones (Pisoni and Lazarus, 1974) by changing the criterion for the 
detection of differences between vectors; it no longer necessary to invoke 
auditory memory to account for this finding. Most ^likely, the multicategorical 
vector is also the basis for confidence judgment^ and ratings of category goo4-\, 
ness (Barclay, 1972; .Vinegrad, *1972; Summerfield, 1975; . Cooper , Ebert, and - 
Cole, 1976). It is useful to consider the multicategorical vector as the 
stimulus code on which the human listener operates according to the demands pf , 
the task. Deciding upon phonetic labels is only one of these possible^^'^ks, 
and other, tasks such as discrimination or rating are probably not more'^based 
on labels tha^;i identification is based on implicit discriminations or ^atin^s. 
The notion of an intermediate, "multicategorical" stage may contribute to thfe 
understanding 'of various problems in speech perception that so far^'have been 
viewed in the. light of the. ubiquitous auditory-phonetic dichotomy (StudJert- 
Kennedy, in press) . . , 

REFERENCES ' , 

Barclay^ J.^ R. (1972) Noncategorical perception of a voiced stop: A replica- 
- tion. ' Percept. Psychophys. 11, 269-273. / 

Cherry, E. C. and B. McA. Sayers. (1956.) "Human * cross-correlator*":^ A tech- 
nique for measuring certain parameters of speech perception. J. Acoust.' 
Soc. Am. 28, 889-895. ^ , . 

Cooper, W. E. (1974) Adaptation of phonetic feature analyzers for place 6f 
articulation. J. Acoust. Soc. Am. 56 , 617-627. 

Cooper,. W. E. , R. R. Ebert, and R. A. Cole. (1976) Perceptual analysis of. 

stop consonants and glides. J.' Exp. Psychol.: Human Perception and P^- 
formance 2, 92-104. ' [ 

Cooper, W, E. and R. N. Nager. (1975) Perceptuo-motor adaptation to speech: 
An analysis of bisyllabic utterances and a neural model* J. Acoust. Soc* 
/ Ani^ 58, 256-^265. 

Cullen, J, K., Jr., C. L. Thompson, L. F. Hughes, C. I. Berlin^ and D.. S.' Samson. 
(1974) The effects of varied acoustic parameters on performance in 
dichotic speech perception tasks. Brain Lang. 1, 307-322. 

Cutting, J. E. (1976) Auditory and linguistic processes iij speech perception: 
Inferences from six fusior^s in dichotic listening. Psych > Rev. 83 , 114-140. 

Eimas,'P. D. (1963) The relation between identification and discr-imiuation 
along speech and non-speech continua. Lang. Speech ^, 206-217. 

Eimas, J>. D. and J. D/ Corbit. (1973) Selective adaptation to linguistic 

feature detectors. Cog. Psychol. 4, 99-109. ^ . - 

Fujisaki,. H. and T. Kawashima. (1970) Some experlpents on speech perception 
and a model for the perceptual mechanism. In Annual" Report of the Engin- 
eering Research institute (Faculty of Engineering, University of Tokyo) 29 / 
207-214. . ^ 

Halves, T. G.^ (1969) Effects of dichotic fusion on the perception of speech. 
Unpublished Ph.D. dissertation. University of Minnesota. 

Kuhn, G. M. (1973) Th^ Phi coefficient as an index of .ear differences in 
dichotic listening. Cortex 9, 447-457. 

138 • . . » ^ 



141 



Hi; /. 

i:\ Lib^erman, A. m:, K. S. ^Harris, H., S. Hoffman, and C. Griffith. (1957) The 
discrimitiation of speech sounds within and across phoneme boundaries. 
J. .Exp. PsychoL 54 > 358-368. 
: Fisoni, D. B. (1971) On the nature of categorical perception of speech sounds. 
Unpublished Ph.D. dissertation. University of Michigan.' 
Pisoni, D. B. and J. H. Lazarus. (1974) Categorical and noncategorical modes 
of speech perception along the voicing continuum. J. Acoust. Soc. Am. 55^, 
328-333. 

Pollack, I. and D. B. Pisoni. (1971) On the comparison between identification 
and discrimination tests in speech perception. Psychon. Sci. 24 , 299--300. 
"Repp, B. H; (1976a) Effects of fundamental frequency contrast on discrimina- 
' tion and identification of dichotic CV syllables at various temporal de- 
\ lays. Mem. Cog. 4> 75-90. 
Repp, B. H. . (1976b) Identification of dUchotic fusions. Haskins Laboratories 
Status Report on Speech Research SR^45/46 (this issue). [Also J. Acoust. 
Soc. Am. (in press).] 
Speaks, C. and L. J. Bissonette. (1975) Interaural-intensive differences and 

dichotic listening. J. Acoust. Soc. Am, 58, 893-898. 
Studdert-Kennedy, M. (in press) Speech perception. In Contempora ry Issues in 
Experimental Phonetics , ed. by N. J. Lass. (New York: Academic Press)'. 
>v<. ^ Summerfield, A. Q. (1975) Cues, contexts, ^and complications in the perception 
of voicing contrasts. Speech Perception^ Report on Speech Research in 
Progress (Psychology Department, The Queen's University of Belfast) Series 
' 2, no. 4 > 99-130. 

Vinegrad, M. D. (1972) A direct magnitude scaling. method to investigate cate- 
' , gorical versus continuous modes of speech perception. Lang. Speech 15 , 

. 114-121. 

Youn^TXL. L., Jr., C. Parker, and R. Carhart. (1975) Coherence . function, for 
sp^ch^ J. Acoust. Soc. Am. / Suppl. 58 > S54 (A) . 



139 



ERIC 



142 



Coperception: Two Further Preliminary Studies 
Bruno H. Repp* , 



ABSTRACT 

Two "same-different" reaction-time studieg were con^iucted to in- 
vestigate the temporal Limits of perceptual integration ("copercep- 
tion") in speech perception, as measured by the influence of irrele- 
vant context on the latencies of judgments about designated "target" 
segments. The first study varied the duration of the silent closure 
period of a medial stop in synthetic vowel-consonant-vowel (VCV) syl- 
lables: Surprisingly, the implosive and explosive transitions of the 
I. stop consonant; (the target) wer.e "coperceived," together with the 

final vowel (the irrelevant context), over as much as 200 msec of in- 
tervening silence. The second study varied the duration of the vowel 
in VC syllable^ and found Ho coperception af the vowel (target) with 
the final consonant (context) at all. Both results may reflect the 
degree of discriminability of. the particular target phonemes and con- 
texts used, so that further studies will be necessary to determine 
the generality of the present findings and to elucidate the role of 
discriminability in coperception. 

' INTRODUCTION 

' In an earlier report (Repp, 1975), I. introduced the term "coperception" tp\ 
denote a certaiji class of contextual effects in speech perception. Coperception 
was defined, in analogy to coarticulation, as the influence of one (phonemic) 
^.segment on the perception of another (phonemic) segment in an utterance. The 
measure of perception was stipulated' to be reaction time (of same-different 
judgments, classification, or detection); that is, it was presupposed that the 
speech signal is fully intelligible. This excludes phenomena such as masking 
from the definition of 'coperception. 

The nation of coperception is a direct extension of Garner's (1974) concept 
of stimulus integrality to the temporal domain. An extension to the spatial 
domain in vision has recently been undertaken by Pomerantz and Garner (1973) and 
Pomerantz and Schwaitzberg (1975) whose work provides a parallel to the present 



'^Also University of Connecticut Health -Center, Farmington. 

Acknowledgment : This research would jiot have been possible without the gener- 
ous hospitality of Raskins Laboratories and its director, Alvin Liberman* The 
author was "supported by NIH Grant T22 DE06202 to the University of Connecticut 
Health Center.* 



[RASKINS LABORATORIES : Status Report on Speech Research SR-45/46 (1976)] 



approach. Garner's theory and methods have been applied extensively to the per- 
ception of the simultaneously present dimensions of single stimuli (cf. Gamer 
and Felfoldy, 1970, and Gamer, 1974, in vision} Wood, 1975a, 1975b, in speech 
perception). The problem may^ be formulated in terms of selfittive attention 
(e.g.. Wood and Day, 1975)*^" a stimulus is called integral if its individual 
dimensions or components canno.t be attended to without taking its other dimen- 
sions into account. A speech signal is a multidimensional auditory event that 
extends over time, just as visual stimuli extend into space. In both cases, 
there are obvious limits to stimulus integrality, or coperception. When two 
visual stimuli are sufficiently separat;ed in space, they will cease to be inte- 
gral (Pomerantz and Schwaitzberg, 1975). Likewise, if two speech segments are 
sufficiently^ separated in time, they will no longer be coperceived. By varying 
the spatial or temporal structure of, the stimulus events, the factors that lead 
to coiJerception in the appropriate modalities may be explored. In vision, there 
is good reason to beTnteve that our intuitions about^ what forms a good Gestalt 
will be releyant (Pomerantz and Schwaitzberg, 1975).. We may ask the analogous 
question in speech perception, and in auditory perception in general; What por- 
tions of the auditory signal represent a "Gestalt," and what are the properties 
that define it? 

Pisoni and Tash (1974) and Wood and Day (1975) have demonstrated that an 
initial stop consonant and the following vowel are such an auditory Gestalt. 
The relevant factor here may be the absence of any acpustic segmentation corre- 
sponding to the two phonettc segments, especially when the initial stop conso- 
nant is voiced and has no "burst," that is, when it is repr.esented only by the 
initial transitions of the vowel. In other words, the continuity of the signal 
may be a crucial factor in copercept:^.on, as it is in the perception of temporal 
order (Dorman, Cutting, and Raphael, 19^5). However, consider a medial stop con- 
sonant, as in /abi/. Here, the implosive transitions^, of th.e initial vowel are 
separated from the explosive transition into the final vowel by a silent closure 
period. (In natural speech, low- intensity voicing may continue through the 
closure.) Are the two portions of the auditory signal, which separately are 
heard as /ab/ and /bi/, still coperceived across the gap separating them? It 
has been demonstirated (Repp, 1975) that they are, as is intuitively suggested by 
the fact that only a single consonant is heard. T 

The present pap^r reports two ftirther preliminary studies! They are con- 
sidered preliminary because their results suggest additional factors that will 
have to be taken into account in research on coperception. Therefore, these 
studies will primarily serve to illustrate and discuss some methodological 
issues. Their results cannot be considered conclusive. 

The first experiment was concerned with the limits .of the coperception 
effect in vowel-consonant-vowel (VCV) syllables: If the silent closure period 
is extended in duration, when will coperception of implosive and 6xpf6sive tran- 
sitions (plus the final vowel) cease? The prediction was straightforward: at a^ 
certain separation, not one but two (geminate) consonants will be heard, for 
example, /afi-bi/ (Delattre, 1971; Dorman, Raphael, Libennan,' attd Repp, 1975), 
and this closure duration was e^ipected to mark the end of coperception. 

The second study investigated coperception in vowel-consonant (VC) sylla- 
bles. Pisoni and Tash (1974) and Wood and Day (1975) have shown that, in conso- 
nant-vowel (CV) syllables. Judgments about the vowel are influenced by variations* 
in the initial consonant, although the vowel has a steady state that is entirely 

142 ^ ' • * • " . ' . 



independent of the consonant. Apparently, the fact that the consonant (i.e., 
the formant transitions) precedes the vowel is important here. In VC syllables, 
on the other hand, the steady state of the vowel precedes the final consonant. 
Will the consonant still be coper ceived, with the vowel? Clearly, if the vowel 
is sufficiently long, a response to the vowel can be made before the final 
transitions, even enter the ear, so th6re must be a limit to coperception. In 
order to investigate this limit, the duration of the vowel was varied systemati- 
cally. ^ 

i 

Both studies reported here used same-different paradigms, based on th e 
assumption that the results would be comparable to those obtained in a speeded- 
classification paradigm, the more traditional technique for assessing stimulus 
integrality (Garner, 1974). While there is little Evidence to suggest the con- 
trary, the two paradigms nevertheless differ in important respjects, and it will 
be necessary to compare the two techniques in future studies. In the same~dif- 
ferent task, two utterances are^ presented in succession, and the listener is 
asked to judge whether a certain well-defined segment is the same or different 
in the two stimuli, while other irrelevant segments vary randomly. Coperception 
is said to exist when "same" judgments are facilitated by identity of the con- 
texts (relative to nonidentical contexts) and/or when "different" judgments are. 
facilitated by nonidentity of the contexts (relative to identical contexts). 

EXPERIMENT I * 

Method 

Subjects . Eight paid volunteers participated. All were native speakers of 
English, had. normal hearing and little experience in reaction-time tasks. 

StimuU . Four VCV syllables~/abi/, /adi/, /abe/, and /ade/~were synthe- 
sized on 5he Raskins Laboratories parallel resonance synthesizer. They con- 
sisted of'two acoustic segments, 200 and 300 msec long, respectively ,* separated 
by a-variable silent gap. The first segment included an initial steady state 
followed by 45-msec implosive transitions that did not vary with the final 
vowel (i.e., they vejre identical 'in /abi/ and /abe/ and in /adi/ and /ade/). 
The second segment began with 45-msec explosive transitions (independent of the 
initial vowel) and ended in a steady state. The durations of the silent closure 
period were 50, 100, 150, and 200 msec, • resulting in total stimulus durations, of 

550, 600,-650, and 700 ms^c% respectively. , 

' . . 

An experimental tape containing pairs of these stimuli was recorded using 
the pulse code modulation system at Raskins Laboratories. The stimulus onset 
asynchrony within a pair was 1 sec and constant (the inters timulus interval 
varied with the duration of the closure period) , -and the interpair interval was 
3 sec. The tape . contained first a short practice series (eight pairs at each 
closure duration), which was followed by four blocks .of 80 palr^-=each. Each 
block corresponded to a particula*t closure duration ^nd contained five subblocks 
(not separated by pauses) ^ each containing the 16 possible combinations of the 
four syllables in random order. . ^ , — 

Procedure . The subjects were tested individually in a single session last- 
ing about 90 minutes. Each subject listened to the experimental tape twice. 
The four blocks were presented once in ascending order (i.e., with closure dura- 
tion increasing) and once in descending order, counterbalanced between subjects. 

.143 



The assignment .of the hands to the two response keys ("same-different") was also 
counterbalanced between subjects • The subjects were instructed to respond as 
quickly and as accurately as possible. Before the experiment, they were told 
exactly what the stimuli represented, that they would tend to hear two identical 
consonants at the longest closure duration(s), and that they should judge the 
consonants only and ignore the variation in the final vowel. It was stressed 
that they should respond as soon as they could reach a decision and not wait for 
the end of the utterance. 

The tape was played back from an Ampex AG- 500 tape recorder through a mixer 
to Telephonies TDH-39 earphones. The intensity was set at a comfortable level 
(about 75 dB SPL) . The syllables, which had been recorde^^^.separajte channels, 
were presented binaurally after electronically mixing the two channels. The on- 
set of the first syllable in a pair triggered a ^Hewlett-Packard 522B electronic 
counter that was stopped by the subject's depression of one of the two response 
keys. The reaction time was recorded to the nearest millisecond,^ together with 
the kind of response given. C 

* i 

The -stimulus onset asynchrony (1 sec) was subtracted from the reaction I 
times, so that they were measured from the onset of the second syllable in a ; 
pair . tThis is how the reaction times are given below. In order to obtain the / 
latencies with reference to the onset of the silent closure period in the second 
syllable — as in Repp, 1975 — another 200 msec should be subtracted.) Prior to 
analysis, median reaction times were calculated for the five replications of the 
same stimulus pair .in each block, omitting errors. Further analysis was in 
terms of the means of these medians. 

Results and Discussion 

Assuming that the -basic effect of coperception is replicated at the short- 
est closure duration (50 msec), there are two patterns the results may follow. 
If the subjects were able to rely increasingly on the implosive transitions alone 
as closure duration was lengthened, the difference between reaction times as a 
function of context should decrease to zero, and the absolute latencies should 
not be affected by closure duration. This was the expected outcome. On the 
other hand., the null' hypothesis is that at all closure intervals the subjects- 
would rely on the explosive transition? alone. In this case, not only should 
the context e££ect remain constant, but the latencies should increase as a 
linear function of closure duration with a slope of unity. Thi3 is because 
latencies are measured from the onset of the VCV syllable, and an increase in ^ 
closure duration means that the listener has to wait that ^uch longer before, he 
hears the explosive transitions and can reach a decision. 

The outcome is shown in Figure 1. Surprisingly, it is in close agreement 
with the null hypothesis. It can be seen that all redction^ times increased with 
slopes close to unity, especially at the longer closure durations; the flatter 
slopes at the short durations probably reflect a floor effect. It is also evi- 
dent that at all closure durations "same" reaction times were faster, when the 
final vowel was the same than when it was different; that is, coperception was 
present and persisted up to the longest interval. Only the "different" teaction 
times show an interaction: at the sho^rtest closure duration, they were faster 
when the final vowels were different, as predicted; but there was no such dif- 
ference at the longer durations. "Diffe^rent" reaction times were considerably ^ 
slower than "same" reaction times, which is a common finding in tasks of this 
sort. , ' . ' 

144 . * \ ' ' . 



146 




50 100 150 200 



CLOSURE DURATION 

(msec) 

Figure 1: Average^ median reaction tipies as a function of closure duration. 

"Same" and "different" latencies are shown for identical and non- 
identical vowel contexts." 



Figure 1 is actually not very representative of the individual data, which 
showed substantial variation. In view of this variation, and of the negative 
result, no statistical analysis was deemed necessary. The data were considerably 
more variable than in the previous similar experiment (Repp, 1975). The only 
effect consistently shoxm by all subjects was the linear increase in reaction 
times with closure duration. Only four of the eight subject's actually showed a 
positive coperception effect in their "same" reaction times (but then a very 
large one, which accounts for the average positive effect). This- is in contrast 
to the previous results (Repp, 1975) where all 12 subjects showed a positive 
effect. The coperception effect, on "different" reaction times Was similarly 
variable., and there were also surprisingly large variations -vrLthin the data of 
individual subjects. ^ Consequently, the only reliable findihg exhibited in 
Figure 1 is the linear increase in reaction times. However, this result is suf- 
ficient to suggest that all listeners made their judgments on the basis of the 
explosive transitions ^ilone. ^ 

- " • 147 



It is interesting to observe that the error pattern did not closely corre- 
late with the latencies. The average error rate was 6.3 percent, with individ- 
uals ^varying between 1.8 and 12.0 percent. The errors decreased by about one- 
third from the first to the second half of the session. The pattern is ^shown 
in Table 1. 



TABLE 1: Average error percentages as a function- of 
closure duration and type of stimulus pair. 







Closure 


duration 




Correct response 


Context 


50 


100 


150- 


200 


X 


, "Same" 


4 ^ 


3.4 

■10.3 


3.1 
10.0 


1.9 
9.1 


5.3 
7.8 


3.4 
9.3 


"Different" ' 




8.8 
5.3 


5.0 
3.1 


10.0 
1.1 


5.0 
5.0 


7.2 
5.2 




X 


7.0 


5.3 


7.1 


5.4 


6.3 



It can be seen that the error rates at closure durations of 100 and 200 
msec were lower than at 50 and 150 msec; by no means did the errors follow the 
linear increase observed for the reaction times. The fluctuation was due to 
incorrect "same" responses (lower part of Table 1); the frequencies of incorrect 
"different" responses were fairly constant. The effect of context is reflected 
in the error frequencies: there were fewer i^xcorrect "different" responses when 
the context was the same than, when it was different, and fewer incorrect "same" 
responses when, the context was different than when it was the same. The first 
effect was present at all closure durations but reduced at 200 msec, while the 
second effect wasAless pronounced but present at all closure durations except 
200 msec. Thus, the error rates are suggestive of some change in processing at 
the' longest closure duration. 

It would be naive to take theVesults at face value and conclude 'that im- 
plosive and explosive transitions are perceptually integrated over a total* 
period of almost 300 msec, even if this is still within the upper limits of the 
acoustic store postulated by Massaro <19^72, 197A) . It is also unlikely that all 
eight subjects failed to obey the instructions, which were clear enough. One 
possibility is that the prediction that coperception would cease as soon as 
geminate consonants are heard was essentially correct but that the naive sub- 
jects still heard only a single consonant at the longest closure duration. The 
c'isigure durations were selected by the author,' who clearly heard geminate conso- 
nants with the 200-msec closure period but not at the shorter durations. It is 
shortcoming of the experiment that "single versus geminate" judgments were not 

elicited in a control condition. 
« 

Another deficiency of this study was that it did not test whether the sub- 
ject's actually were able to discriminate the implosive transitions in isolation. 
Clearly, if they could not tell /ab/ and /ad/ apart, they woul4 have had to rely 
on the explosive transitions in the VCVs. However, four of the subjects partici- 
pated in Experiment II, described below, which included VC syllables identical 



146 



J - 



ERIC 



with the first portion of the VCV syllables of the present study. Three subjects 
were able tp discriminate them without much difficulty, and only one subject 
failed. Since all subjects showed a linear increase in reaction time in the 
present experiment, failures to discriminate the implosive transitions are not 
a likely explanation for the results. 

* 

However, it is well-known to those working with synthetic speech that im- 
plosive transitions are not easy to discriminate, especially in the absence of 
the release burst that natural syllable- final stops often show. This low 
"salience" of implosive transitions may be the reason that the explosive transi- 
tions determine t;he perceived place of articulation when implosive and explosive 
transitions in VCV syllables are artificially brought into conflict (Dorman, 
Raphaels Liberman, and Repp, 1975; Fujimura, 1975). It also may lie at the 
heart of the present problem. Difficult discriminations are necessarily, associ- 
ated with long decision times. Let us assume that the subjec<:s heard the implo- 
sive transitions at the longest closure duration; that is, they heard geminate 
consonants.. It is then possible that they attempted to reach a decision as soon 
as they heard" the syllable-final stop, but that the decision process was not yet' 
completed by the time the syllable-initial "stop arrived. This may hav6 inter- 
rupted the ongoing decision process, or it may have initiated a separate desision 
,*process of its own, which overtook the earlier process. For example,, if the. 
/decisions for the syllable-final consonants lasted about 300 msec longer than the 
^decisions for the syllable-initial consonants, on the average, the latter would 
have been completed earlier than the former in irfpst cases. Although such a 
large difference is rather unlikely, the hypothesis needs to be tested by asking 



subjects to discriminate implosive transitions in isolation (i^e., in 
bles).l ."^ 



VC sylla- 



The tentative conclusion from the preceding paragraph is that the discrim^ 
inability ^of the 'target segments may play an important role and should be includ- 
ed as a parameter in studies of coperception, whenever possible. Wo0d and Day 
(1975) have discussed the same problem in the context of the speeded-classifica- 
tion paradigm. Unfortunately, in the case of syllable-final . transitions, not 
much can be done to improve discriniinability. Perhaps the /^b/-/agV contrast 
will prove easier to discriminate than' the /ab/-/ad/ contrast, because of the 
larger acoustic difference in the transitions. In'addition, a future experiment 
might employ VCV and VC syllables in the same design, which should direct the 
subjects' attention to the syllable-final consonants. Practice in discriminat- 
ing implosive transitions may also reduce the difficulty of the task* Finally, 
the explosive transitions could be made less discriminable by making them acous- 
tically more similar, in order to increase 'the corresponding decision times and 



In fact, this suggests an alternative explanation of the increase in reaction 
times with closure period duration. It may be that the subjects did make deci- 
sions on the basis of the implosive transitions with a certain probability that 
increased with closure duration (perhaps only at the longest closure duration)* 
The reaction times would then represent a mixture of two distributions — slow 
latencies for implosive transitions and fast latencies for explosive transi- 
tions — anci the increase with closure duration would represent an increase in 
the proportion of slow latencies. However, it would be a rare coincidence if * 
this kind of process had produced the linear functions shown in Figure 1, and 
one should also have expected an increase in errors and a decrease in the coper- 
ception effect as closure period was lengthened. Therefore, the explanation 
seems rathe'r unlikely. 



147 



to discourage the, subjects from relying too much on the syllable-initial conso- 
nants (if subjective strategies are involved at all). These approaches will 
have to be tried out in future 'experiments. * 

EXPERIMENT II 

Method ' , ^ » 

■ ^ 

Subjects . There were four subjects who had, previously participated in 
Experiment I. 

Stimuli . The stimuli were the syllables /ab/, /ad/, /eb/, and /ed/, syn- 
thesized on the Raskins Laboratories parallel resonance synthesizer. The final 
transitions (45 msec) were preceded 5by a steady-state vowel of variable duration. 
The total syllable durations were 100, 150, 200, 250, and 300 msec. 

The experimental tape first contained a brief practice list of single syl- 
lables for identification. It was followed by five blocks of 160 pairs each. 
The 160 pairs consisted of the 16 possible combinations of the four syllables, 
with ^wo possible durations of the first syllable (150 or 250 msec) and five 
possible durations of the second syllable, which were completely randomized. 
The stimulus onset asynchrony was constant at 750 msec, and the interpair inter- 
val was 3 sec. , / " 

Procedure . The procedure was similar to that in Experiment -I, except that 
the subjects were instructed to judge as rapidly as possible whether the vowels 
were the same or different, ignoring vowel duration and the final consonant. 
Reaction times were measured from the o^nset of the second syllable in a pair. 
The analysis was performed on mean reaction times, omitting errors and excep- 
tionally long latencies. - • 

,Results and Discuss ion 

?1 ' , r 

Two hypotheses were tested in this exp^timent. One predicted that there 
would be a coperceptJLon effect when the second syllable in a pair was sufficient- 
ly short, and th^t this effect would disappear as the duration of the second syl- 
lable was increased, the second hypothesis predicted, on the assumption that 
fairly literal representations of the speech sounds are compared in the brain, 
that reaction times (perhaps "same" responses only) would be shorter when the 
first and the second syllable had the same duration. The resutts are shown in 
Table 2. ' ^ 

Table 2 shows that the reaction times exhibited surprisingly little varia- 
tion (which attests to the reliability of the data). Neither hypothesis was 
supported. There was no indication of any coperception effect, nor' was there 
any interaction with, syllable duration. The only consistent difference was be- 
tween "same" and "different" 'latencies, a trivial , finding. Although the results 
were' based on only four subjects, it seemed useless to run further subjects in 
this task. , 

The average error rate was 4.9 percen^. The' pattern of errors with respect 
to syllable duration is shown in Table 3. , \ 



148 



TABLE 2: Mean reaction time as a function of syllable durations and t-j^e 
of stimulus pair. « 













Syllable duration 










First: 


* 




150 










250 








Second: 


100 


150 


200 


250 


300 


100 


150 


200 


250 


300 


Response. 


. Context 






















"Same" 




337 
340 


333 
335 


350' 
338 


343. 
334 


353 
356 

a 


335 
358 


352 
335 


363 
317 


345 
323 


527 
333 


"Different" 




366 
363 


356 
353 


340 
373 


366 
359 


343 
361 . 


357 
354 


349 
350 


374 
358 


357 
351 


365 
387 



TABLE 3: Average error percentages as a function of syl- 
lable durations. 



Duration of second syllable 



Duration of 150 
first syllable . 250 



100 


150 


200 


250 


300 


X 


7.2 
9.7 


4.7 
6.3 


3.4 
5.9 


2.2 
4.1 • 


2.2 
3.1 


3.^ 
5.8 


8.4 


5.5 


4.7 


3.1 


2.7 


4.9 



. * ♦ 

In contrast to the latencies, the error rates declined steadily. aa^ the dur- 
ation of the second syllable increased, but, surprisingly, they were higher with 
the longer duration of the first syllable. At the shortest duration of the 
second syllaT>le, the error pattern was in agreement with a'coperception effect 
(not shown in Table, 3), but there were toq few observations to draw* any conclu- 
sions Xand, moreover, coperception is defined in terms of reaction tiiftes, al- 
though the error frequencies often show a positive correlation with the laten- 
ciesj^. No statistical analysis was conducted. \ ^ 

Why did the reaction times show no effect? The reason may be that only the 
vowel onset matters and the information that follows is irrelevant'. In other 
words, final consonants may not be coper ceived with the preceding vowel. Such a 
conclusion would be highly interesting, but the present dat^ do not justify it 
yet. Rather, it is likely fchat discriminability again played a role. The vowel- 
discrimination was fairly easy, so that the decisions may have been completed be- 
fore the final consonant was processed. In addition, the final transitions were 
not easy to . discriminate, so that they were processed more slowly and therefore 
could, not affect the vowel decision any 'more. In order to have any detectable 
effect,, the context must be highly discriminable. It is planned to repeat the 
experiment with VCV syllables and more similar (initial) vowel targets. This 
should. both increase the decision times for the vowel targets .and decrease .the 
decision times for the following consonants (medial consonants are probably more 

• ' 149 



I5i 



discriminable than final stops) , which should improve the sensitivity of the 
experiment* 

A recent study by Healy and Cutting (in press) illustrates the problem -of 
target discriminability. They used a detection paradigm in which a .subject' 
Jtiears a list of utterances ancj responds to only one of them. They preseijteifl 
isolated vowe'ls ^and .VC syllables and asked the subjects to detect- either a. 
vowel or a VC syllable^ Their su^bsequent comparison of vowel and syllable de- 
tection latencies showed faster sylla^lfe detection' latencies for vowels that 
were difficult to discriminate (in a control condition) but faster vowel detec- 
*tion latencies for vowels that were easy to discriminate. This proyides evi- 
dence that the final consonant may be coperceived with the preceding vowel, 
given that the vowel is difficult to classify. Suggestive evidence^ comes also 
from a recent study by Strange, Jenkins, and Edman (1975) » who found that 'the 
intelligibility of isolated vowels increases when they are' fallowed by a stop 
consonant, although, in* this case, perceptual "integration may nave occurred at 
a later stage. It is likely' that a more sensitive experiment than the present 
one will show coperception in VC 'syllables. * • 

' CONCLUSIONS 

While the results of the present experiments are not conclusive, they have 
been h^pful in pointing out a methodolog^.cal issue, perhaps more so than 
''positive" outcome. Nor are the results invalid; they merely represent a sample 
from a whole continuum of stimulus discriminability. The diTscriminability of 
both the target ^nd the context will have to be a parameter in future studies of 
coperc^eptioa. -It is likely that the limits of temporal integration in speech 
per'ception depend on the ease of discrimination of successive portions of the 
speech signal. If this is. true, it means that there are no fixed "units" that 
are processed successively but that a number of concurrent and pverlapplng pro- 
cesses are .triggered by the acoustic stimulus. The size of these processing 
"units depends on the 'clarity of 'the information. In^other words, the speech 
processor "accumulates evidence" until it can reach a decision. However, ^whiie 
this may accurately describe its 'operation in reaction-time tasks, generaliza- 
tions £o the processing of natural Speech must be made with caution, because the 
target of attention is usually not at the phonemic level. Coperception studies 
^reveal only the lower limits of perceptual integration, not its upper limits, 
which may be at least as important in "normal" speech perception. 

REFERENCES 

Derattre, P. (1971) Consonant gemination in four languages: An acoustics, per- 
ceptual and radiographic study, Interaation^ Review of Applied Linguis- 
tics- 9, 31-52; 9, 97-113. ~ 

Dorman, >I. F. , J, E. Cutting, and L. J. *ftaphael. (1975) Perception of temporal 
order in vowel sequences with and without * formant transitions. J. E^p. 
Psychol.: Human Perception and Performance Ij 121-129. 

Dorman, M. F., L. J. Raphael, A. M. Liberman,* and B. H. Repp.- (1975) Some 
maskinglik^ phenomena iti speech perception. Haskins Laboratories Status 
Report pn Speech Research SR-42/43 , 265-276. 

Fujlmura, 0, (1975) A look into the effects of context: .Some articulatory and 
perceptual findings. Paper presented at the 8th International Congress of 
'Phonetic Science, Leeds, England, 17-23. August. 

150 * . * 



ERLC 



152 



Garner, W. R. (1974)^ The Processing of Information and Structure , (Potomac, 

Md. : Lawrence Erlbaum Assoc.)* 
pa'rner, W.' t. and G/ L. F,elfoldy. (1970) Integrality of stimulus dimensions in 

various types of information processing. Co^. Psychol. 1, 225-241. 
Healy,>A. F. and J. E. Cutting, (in press) ' Units of speech perception: 

Phoneme and syllable. J. Verbal Learn. Verbal Behav. 
Massaro, D. W. (1972) Preperceptual images, processing time, and perceptual 

' .units in auditory perception. Psychol. Kev. 79 , 124-145. 
Massaro, D. W. (1974) Perceptual units in speech recognition. J. Exp. Psychol. 

102 ,' 199-208. 

Pisoni, D. B. and J. Tash. (1974) "Same-different" reaction times to conso- 
nants, voxels, and syllables. In Research on Speech^ Perception (Department 
of Psychology, Indiana University), Progress Report- No. 1.. 

Pomerantz, J. R. and W. R. Gamer. (1973) Stimulus 'configuration in selective 
attention tasks. Percept . Psycho^'phys . 14 , 565-569. 

Pomerantz, J. R. and S. D. Schwaitzberg. (1975) Grouping by proximity: 
Selective attention measures. Percept. Psychophys^' 18 , 355-361. 

Repp,*B. H. (1975) ."Coperception" : A preliminary study. * Haskins Laboratories 
' Status Report on Speech Research SR-42/43 , 147-157. 

Stijange, W. , J. J. Jenkins^ and T. Edman. (1975) Identification of vowels in 
CV and Vc syllables. J. Acoust. Soc. Am. , Suppli 58, S59(A). 

Wood, C. C. (1975a)' Auditory aixd phonetic .levels of processing in speech per- 
gepj^ion. ' j. Exp. Psychol.: Human Perception and Performance 1, 3-20. 

Wood, C. C. (1975b) A normative model for redundancy gains in speech discrimin- 
ation. In Cognitive Theory: Vo%^ 1 » ed.' by F.. Restle, R." M. Shiffrin, 
N. J. Castellan, H. Lindtaan, and % B. Pisoni. '(Pot^jmac, Md.: Lawrence 
Erlbaum Assoc.). 

Wood, C. C. and R.^ S. Day. <1975) Failure of selective attention to phonetic 
segments in consonant-vowel syllables. Percept. Psychophys. 17 , 346-350. 



* 153 



151 



"Posner's Paradigm" and Categorical Perception:' A Negative Study 
Bruno H. Repp* 




^* ■ ^ ABSTRACT 

A reaction-time study was conducted with four "synthetic sylla- 
bles from a "place continuum" (/bae/-/dae/-j^-/dae/2-/gae/) . A special 
counterbalanced design was used to assess the effect of acoustic sim- 
ilarity on reaction time. The study included a "same-different" and 
a classification task, two different temporal delays between the syl- 
lables^ and binaural versus dichotic (i.e., alternating monaural) 
presentation. However, no effects' of auditory similarity were found, 
which contradicts a recent study by Eimas and Miller (1975) that use.d 
similar stimuli. 

* . INTRODUCTION 



Posner and Mitchell (1967) introduced an experimental paradigm that has led 
to some of the most elegant "and successful research in visual information pro- 
cessing (e.g., Posner, 1969; Posner, Boies, Eichelman, and Taylor, 1969). The 
task consists in judging whether two letters are the same or different, with 
reaction time as the dependent variable. The two letters can be either identi- 
cal (AA) or different (AB) ; in addition, they can have the same name but be, 
physically different (Aa).. The siibjects are instructed to respond "same" when 
the two letters have the same name, and "different" otherwise. The principal 
finding is that "same" reactipn times are faster for pairs that are physically 
identical (AA) than for pairs that are physically different (Aa) . This suggests 
that physically identi^^l letters can be matched at an earlier "node" in pro- 
cessing, which uses pux^ly visual information, while name matches, in the absence 
of physical identity take place at a later processing stage.- Posner andjjCeele 
(1967) introduced temporal delays between the two stimuli, in order to fjLnd out 
whether the visual information that leads to the relative advantage for physical 
matches is subject to decay. They found a steady* decline of the reaction-time 
difference ^pvBr the first 2 sec, suggesting"" that the visual information is held 
in-a rerlatively short-lived store. " 

Similar parvadigms have been profitably applied in speech percept^bn (e.g.. 
Springer,. 1973*; Ctftfe, Coltheart, and Allard, 1974; Repp, 1976a). Perhaps the 



*Also University of Connecticut Health Center, Farmington. 

Acknowledgment : This research was. made possible by the generous hospitality of 
Raskins Laboratories and its clirector, Alvin Liberraan. The author was supported 
by NIH Grant T22 DEO0202 to the University of Connecticut Health Center. 

fHASKINS LABORATORIES: Status Report on Speech Research SR-45/46 (1976)1 

• . . ' * . 153 



151 



most interesting of these studies is that of Pisoni.and Tash (1974). They 
applied Posner's paradigm to the classical problem of categorical perception* 
It is well-known that initial stop consonants are easy to discriminate as long 
4S they are perceived as belonging to. different categories, but that acoustic 
differences within these categories are almost impossible to detect (e.g., 
Pisoni, 1971). It has been suggested that this phenomenon may be due to the 
rapid loss of auditory information from memory (Fujisaki and Kawashima, 1970; 
Pisoni, 1971, 1973). Pisoni and Tash (1974) presented two synthetic syllables 
in close succession, which could be either physically -identical (e.g., /ba/]^- 
/ba/]^), different acoustically but belonging to the same category (/ba/j^"-/ba/2) , 
or belonging to different categories (/ba/-/pa/). The acoustic variaMe was, 
voice onset time (VOT) , the most important cue for the distinction between /ba/ 
and /pa/. The listeners were nof aware of these acoustic variations and simply 
made "same-different" judgments with respect to the categories Of the syllables. 
Pisoni and Tash found significantly shorter "same" reaction times fot "physical 
matches" than for mere "name matches," just as Posner did. In addition, they 
found "different" reaction times to decrease with the acoustic difference be-r. 
tweentwo syllables from different categories, which constitutes additional evi- 
dence for the availability of auditory information. (The corresponding finding 
in vision would be faster "different" ,latencies for Ab. pairs than for AB pairs, 
a condition that has rarely been included and then has not yielded a positive 
effect~e.g., Besner and Coltheart, 1975). Pisoni and TaSh suggested a two- 
stage processing model that allows for fast auditory matches to be conducted 
before slower phonetic (name) matches. Stimuli that are either identical or 
very different from each other may permit lower-level auditory decisions, while 
more ambiguous cases are decided at the phonetic l6vel. 

The Pisoni and Tash findings are especially interesting because, in con- 
trast tp other, Posner-type tasks, the subjects are not aware of the physical 
differences within name categories^ that is, no special "name match" instruc-- 
tions'are necessary, as in., the letter-matching task. Again, the question arises 
whether and how fast the auditory information is lost from memory. This may be 
investigated by varying the interval between the two syllables that are to be 
compared.^ J[ conducted such a study two years ago at the University of Chicago.-^ 
Pairs *of syllables from a VOT continuum (ranging from /ba/ to /pa/, as in Pisoni 
and Tash, 1974) were presented at stimulus onset asynchroniea (SOAs) between 0 
and 3.3 sec.^ There Was a'clear effect of acoustic differences on "different" 
reaction times, which, moreover, did not decrease as the delay between the syl- 
lables increased. However, in contrast to the findings of Pisoni -and Tash 
(1974), there was no clear evidence of any effect on "same" reaction times, 
which is the t)rimary evidence for the availability, of auditory information. 

The effect on "different" reaction times could%ave a different explanation. 
It is well-known that it. takes longer to classify stimuli that lie close to a 
category boundary than stimuli that are far from the boundary (Studdert-Kennedy , 
Liberman, and Stevens, 1963; Pisoni and Tash, 1974; Eimas and- Miller,' 1975; 
R^PP> 1975). Pairs of acoustically very discrepant stimuli necessarily contain 
stimuli from the. ends of the acoustic continuum, while pairs of acoustically 



Repp, B* H. (1974) Categorical perception, auditory 'memory, and dichotic in- 
terference. Unpublished manuscript. Copies of this paper are available from 
the author upon ^request.. Soma of the results were presented at the 89th meet- 
ing of the Acoustical Society of America in Austin*, Texas (Repp, 1975). 

154 ^ ♦ ^ - 



more similar ^syllables (from different categories) .contain at least one stimulus 
that is close to the category boundary.^ Therefore, the differences in categor- 
ization time for individual stimuli are confounded with the degree of acoustic 
discrepancy in between-category comparisons, and the effect on "different" reac- 
tion times could simply arise from the successive categorization and phonetic 
comparison of the two syllables. This explanation would also predict that the 
effect does not decrease wi!th increasing SOA (Repp, 1975). This methodological 
objection does not apply to the "same" reaction times, since the individual 
stimuli contained in within-category comparisons can be properly counterbalanced 
(asl,n the experiment of Pisoni and Tash, 1974).. One reason my study did ndt rep- 
licate theirs might have been the presentation of the two syllables to d;Lfferent 
ears, while Pisoni and Tash' had presented them binaurally. 

The present study asked the following^ questions,: 

1. Is the Pisoni-Tash effect obtained with syllables from a "plaee 
continuum," that is, with syllables whose acoustic diffe^repces 

. lie in J:he initial formant transitions (and which' ar,e also per- . 
ceived in a highly categorical fashion — see Pisoni, 1971)? 

2. If so, does this effect decrease as the temporal separation be- 
tween the syllables is increased? 

3. Is there a difference in the magnitude of the effect when the 
syllables are presented to different .ears rather than binaurally? 

4. Does the Pisoni-Tash effect really reflect auditory comparisons o j 
between the two syllables, or does it perhaps consist in an in- - f 
fluence of the first syllable in a pair on the categorization J 
time of the second syllable? Entus an^ Bindra (1970) and ( 
Eichelman (1970), among others, have provided evidence that 
"same-different" reaction times and sequential effects in simple 

- choice-reaction time are related and may reflect the same un4er- 
lying processes. This was investigated here by including a con- 
edition in which the subjects had to classify the second syllable 
in each pair, ignoring, the first syllable. 

The study used a design that avoids the methodological problem with "dif- 
ferent" reaction , times discussed above. This design requires three categories 
on a single acoustic continuum, which is the case with a place* continuum 
(/b/-/d/-/g/) . Only four stimuli were used: /b/, /d/^, /d/2, and /g/. (The 
vocalic context, . /ae/, was constant.) /d/3^ was acoustically closer to /b/ and 
/d/o was closer to /g/. The predictions were that "same", reaction times should 
be faster for /d/i-/d/i and /d/2-/d/2 than for /d/i-/d/2 ana /d/2-/d/i, and "dif- , 
ferent" reaction times should be' faster for /b/-/d/2 and /g/-/d/i ('and their re- 
verse orders) than for /b/-?d/i and /g/-/d/2 (and their reverse orders). It can 
be seen that this design is completely balanced and. therefore leads to uncon- 
founded results for both "same" and "different" reaction .times. 

' • METHOD 

Subjects - , ^ 

Eight paid volunteers (five women and three men) from the Haskins-Yal§ 
summer subj-ect pool participated. Two of the men were left-handed. All had 
normal hearing and were relatively • inexperienced. • 



156 



Stimuli 

■> 

Four synthetic syllables were produced on the Raskins Laboratories parallel 
resonance synthesizer. Two stimuli were supposedly good instances of /bae/ and 
/gae/, respectively, whilfe the other two both sounded like /dae/ (cf. Repp, 
1976b; the constant vowel will be omitted in referring to the stimuli). The 
two /d/s, /d/j^ and 1^1 2> differed only in the onset frequencies of the second 
formant (F2) , which were 1620 and 1772 Hz, respectively. Since the steady-state 
vowel had its F2 at 1620 Hz, /d/j^ had a flat F2, while /d/2 had a falling tran- 
sition. The third formant (F3) fell from 3026 to 2862 Hz in both' /d/s. Like- 
wise, /b/ and /g/ differed only in their ^'2"^"^sitions (starting at 1232 and 
2156 Hz, respectively) and had identical F3-transitions (starting at 2180 Hz). 
All syllables were of 280-m8ec duration, with 15 msec of prevoicing, no bursts, 
and a constant fundamental frequency (114 Hz). 

Of tlxe sixteen possible ordered pairs of the four syllables, /b/-/g/ and 
/g/-/b/ were omitted and /b/-/b/ and /g/"-/g/ were duplicated* instead. This re- 
sulted in an equal number of "same" and "different" pairs.' Four stimulus lists 
were recorded. Each contained 80 syllable pairs, viz. 5 blocked replications of 
the 16 pairs, randomized within blocks. In the first and fourth lists, the SOA 
was 500 msec; in the second and third lists, the SOA was 2 sec. Each stimulus 
pair was preceded by a 100-msec warning buzz 'that came on 500 msec before thd 
first syllable. The two syllables in a pair were recorded on separate channels* 
The interpair interval was. 3 sec. 

Procedure 

Each subject participated in two 90-miniite sessions on different days. The 
sequence of the two tasks was counterbalanced across subjects. In one session, 
the subject was instructed to judge whether the two syllables in a pair were the 
same or different by pressing the response key with the appropriate label (same- 
different task ) . In the other session, the instructions were to ignore the 
first syllable and to classify the second syllable as either "D" or "non-D," 
that is, "B or G" ( classification task ). The subjects were told that there were 
three syllables, /bae/, /dae/, and /gae/. In the classification task, they were 
informed that /b/ and /g/ never occurred together in a pair but that, apart from 
this, the first syllable provided no clue about the second syllable. The sub- 
jects were encouraged to be as fast and as accurate as possible. A ptactice 
series of 32 pairs at SOA = 500 was presented at the beginning of ^each session. 

In each session, a subject listened to the experimental tape twice, once 
binaurally and once dichotically (i.e., with the warning tone and the first syl- 
lable in one ear and the second syllable in the other. ear; "dichotic" is used 
here in the wider sense of "different— but not necessarily simultaneous — inputs 
to the two ears"). The sequence of the two presentation modes was counterbal- 
anced across subjects, but it was the same In both sessions "for a given subject. 
Which ear received the first syllable in the dichotic condition was also coun- 
terbalanced across subjects,* but fixed. for each individual subject. 

The tape was played back from an Ampex AG-500 tape recorder through an 
amplifier/attenuator to Telephonies TDH-39 earphones* Playback intensity was 
approximately 88 dB SPL (peak deflections on a voltmeter) , Dichotic and bi- 
naural presentation modes were established by means of .electronic switches.' 
Reaction times were recorded on a Hewlett-Packard 522B electronic counter, which 
156 ■ ■ . 



was started by the onset of the warning tone and stopped by depressing either 
response key. Appropriate constants were subsequently s\ibtracted from all 
latencies, so that they were measured with reference to the onset of the second 
syllable in a pair. The subject used both hands for responding, one for each 
key. Hand-response assignment was again counterbalanced across subjects. 

At phe end of the second session, each subject was questioned whether he • 
or she had noticed anything about the stimuli that had not been mentioned in 
the instructions, and subsequently, given that there were two different versions^ 
of one syllable, which of the three syllables this might have been. Of the 
eight subjects, three showed no awareness whatsoever .(they also had the lowest 
error rates), three stated that /d/ and /g/ were more difficult, to discriminate 
than /d/ ^and /b/ > and the remaining. two claimed hearing /blae/ on occasion 
(these two had the highest error rates). No subject guessed correctly that two 
/ms were involved; all guesses were either "two /b/s" or "two /g/s." 

The first step in the data analysis was to calculate the mediant of the 
reaction times for the five replication's of each stimulus pair in each list, 
omitting errors. Further analysis was in terms af the means of these medians. 

^ , ' ' * ' RESULTS 
Errors ' . ' * 

Since latencies cannot be fully understood without taking the erroi; pattern 
into consideration, the errors shall be presented first. There was great indiv- 
idual variation: average error rates ranged from 2.0 to 17.7; percent. As 
pointed out above, they were':positiv^ly related to the degree of awareness the 
subject had of the presence of four stimuli. However, no subject made consis- 
tent mi^classif ications or misjudgments of certain stimuli; several chatiged 
their- error trends during the course of a session. 

, ' / 

The overall error rates in the two tasks were similar (same-different: 
9.3 percent; classification: 9.2 percent). There was a tendency to commit more 
errors at the shorter SOA (10.2 percent) than at the longer one (8.3 percent).- 
The most striking diffetence was between the dichotic and binaural .conditions, 
with almost twice as. many errors in the former (11.3 percent) than in the latter 
(6.7 percent). As might be expected, this difference wa« more pronounced in the 
same-different task, but it was also present in the classification task; 

In the classification task, /d/ stimuli were misclassif ied more often than 
/b/ and /g/ stimuli (14'.2 vs. 4..2 percent). Most of the errors on /b/ and /g/ 
were probably due to inattention and/or hand-response confusions that were not 
separ^fcely identified in this study (i.e., subjects were not asked , to. "coi^rect" 
their own errors), /d/j^ was misclassif ied more often than /d/2 (18.8 vs.' 
9.6 percent). Misclassif ications of /d/ as /b/ or as /g/ were not distinguish- 
able in this task, but it seems likely that /d/j^ was mostly confused with,/b/, 
and /d/2 with /g/* The nature of the preceding stimulus seemed not to make any 
difference. 

In the same-different task, two* interactions si'miJLar to those predicted for 
the latencies were .expected, since errors and latencies tend to be positively 
correlated in same-different ta:sks. Incorrect "same" judgments shduld.have been 
more frequent in /d/j^-'/h/ and /d/2~/g/ (and reverse)' pairs than in /d/^^^/g/ and 

• * ■ - - • 1^7 



158 



I Al 27 1^1 (and reverse) pairs, and incorrect "different" judgments should have 
been more frequent .in Hl^-I^li (and reverse) pairs than in /d/^^/d/i and ^ 
/d/ 2-/^/2 pairs. Both trends were present but not very pronounced (13.3 vs. 
9.9 percent, and 9.5 vs. 8.1 percent, respectively). Most surprising was the 
fact that /b/-/g/ (and reverse) pairs did not show a substantially lower error 
rate than other pairs (9.2 percent). Clearly, then, the same-different judgment 
errors could not be predicted from the classification' errors, which were more 
than three times higher for /d/ stimuli^than for /b/ and /g/ stimuli. This in- 
dicated either that the two stimuli in a pair were matched before complete 
classification, or that the classification of the second syllable was not inde- 
pendent of the preceding syllable. No such dependence was evident in the 
classification errors, however. 

Latencies 

It Was anticipated that the latencies of subjects with high and low error 
rates might have to be considered separately, because of the positive correlation 
between errors and latencies that is usually ^ound. However, this proved to be 
unnecessary, since the results we^re completely negative,- overaJ^l,and for each 
individual subject. While some effects may not have reached significance because 
of the small number of subjects, the differences of principal interest were 
clearly' not obtained. 

» « 

Consider first the same-different task. The, results for "same" judgments 
are' ^hown'in the first three columns of Table 1. In three of the. four condi- 
tions, the predicted interaction (the^ difference between the second and third 
columns) was in the expected direction (positive) but small; in the fourth con- 
dition, binaural at S0A = 2000, it was in the opposite direction. No difference 
^reached significance, and* no individual subject showed a clear pattern.^ The 
more consistent trend toward longer reaction times at SOA = 2000 than at SOA = 500 
also fell short of significance. 



TABLE 1: Reaction times in the same-different task. ' (Note: the plus 
sign indicates that the reverse order of the stimuli is 
included.) ♦ 



Mode 
Dichotic 

Binaural 



SOA 

500 
2000 

500 
2000 



"Same" 

/b/+/b/ /d/i+/d/i 
' /g/+/g/ 7d/2+/d/2 /d/i+/d/2 



523 
560 

526 
55Q 



536 

565. 

521. 
569 



552 
581 

•5A2 
546 



-"Different" 

/b/+/d/i /b/+/d/2 
./g/+/d/2 /g/+/d/^ 

547 547 
582 -589 



562 
586 



564 
581 



The last two columns of Table 1 show' the "different" la-tencies. Here;, it 
was predicted that the latencies in column 4 would be shorter than those in 
column 5. Clearly, there was- no difference at all. The only consistent tendency 
seems to be again longer latencies at the longer SOA, but it did not reach sig- 
nificance. It will also be noted that "same" latencies were somewhat faster than 
"different" latencies, a difference that is commonly found dnd was not tested for 
significance.' ' ,' ■ / ' 

158 . " 



ERIC 



153 



There were two effects that did. reach signif ibance : the Mode x Order and 
Mode X ''B vs. G" interactions (£ < .01 and £ < .05, respectively). They are 
shown in Table 2. . 



TABLE 2: Two interactions in the same-different task. (Note:' the dash indi- 
cates a specific order of the two stimuli in a pair, /d/ implies both 
Idl^ ^nd 7d/2.) ^ • ^ 



. • Mode /b/-/d/ /g/-/d/ /d/-/b/ /d/-/g/ 

Dichotic 564 553 584 566 

Binaural 567 596 ' 554 ' . 574 



It can be seen that, in the dichotic conditio^, pairs in which /d/ occurred 
first tended to have longer "different" latencies^ than pairs in which /d/ 
occurred second, and pairs containing /bV tended to have longer . latencies than 
pairs containing /g/. The opposite was true iif the binaural condition. ^ These 
effects are difficult to interpret.. . 

We turn now to the classification condition. The results for those sylla- 
bles that were preceded by a syllable frora^ the same category are shown in the 
first three columns of Table 3. • ^ , 



TABLE 3: Reaction times in the classification task., * 



* Mode 


SOA- 


/b/-/b/ 

7g/-/&/- - 


/d/i'-Zd/i 
/d/2-/d/2 


Idly-IUo 
. /d/2-/d/i 




Dichotic 


500 
2000 


543 • 
600 


54 

578 . 


f 

• 518 
594 , 




Binaural 


500 
2000 ' 


' '533 
. - 562 


544 ^ " 
532 


- 53A ■ 
' ■ 5,37 " 




Mode • 


SOA 


/b/-/d/i 


/b/-/d72 
. /g/-/d/i 


-Idly-lhl . 
/d/2-/g/ 


I'di-l-'igi 


Dichotic . 


500. 
2000 . 


567 ■ 
6i9 


552 
608 • 


505 
595 ' 


504 

'570. . 


Binaural 


500 
'2000 , 


' •56"4 .• 
578 

' . ^ 


563 

... 587 

<• 

•0 


527- > 
530 


~" 535 
..540 . 



Again^ there is ino 'cled'r evidence f ot the e'xpected^ effect (faster l^tenci-es 
in column 2 than in column 3). In the dichotic mode, there w^s a .notable ten- 
dency to be slower at SOA = 2000 (not significant) ,vwl}ich indicates, that^ the prcs- 
ceding' syllable w^s not completely ignored. The results for syllables prece^eH 
by. a syllable from a diffetent category are sl^dwn ia the- remainiirg columns >f 
Table 3. Again, there Is no obvious difference betweep columns 4 , and 5,., and , 

\ r ^ _ ' " . / 'l59 



IGO 



columns 6 and 7. However, /b/ and /g/ classification was faster than /d/ classi- 
fication, and the latencies' were again longer^ at S0A»2000. No ef fect reached 
significance. A facilitating effect of a preceding stimulus from the same cate- 
gory may be noted, but only foi^ //i/ classification. 

DISCUSSION 

This experiment provided no evidence for the availability of auditory in- 
formation in the comparison of syllables from a "place continuum." Although 
only eight subjects were tested, their results make it quite unlikely that any 
significant effects would emerge in a larger sample, except for the trivial 
findings that latencies increase with SOA and that "same" latencies are faster 
than "different" latencies. Note that, although the data for "same" latencies 
in Table 1 may be suggestive of a small effect, jio individual subject showed a . 
clear pattern of results, despite reasonably stable data (10 replications of 
each stimulus pair). 

Of course, it is entirely possible that the results of Pisoni and Tash 
(1974) pertain only to differences in VOT, a temporal variable, while differ- 
ences in formant transitions are not -retained in auditory memory. However, a 
study conducted independently at about the same time as the present experiment 
by Eimas and Miller (1975) did find a positive effect. 

Their study is the more remarkable because it used stimuli from the identi- 
cal place continuum, originally prepared at Haskins Laboratories by David Pisoni 
(see Pisoni, 1971). (The present /b/, /d/]^, and /d/2 were the'ir stimuli 1, 6, 
and 8, respectively — see their Table 1. Their continuum did not include /g/ 
stimuli.) They used a design similar to that of Pisoni and Tash (1974), coun- 
terbalanced for "same" pairs but not for "different" pairs. Miller and Eimas 
were aware of the alternative .explanation for effects on "different" reaction 
times and emphasized the comparison of "same" reaction times for Identical and 
nonidentical pairs. There were three SO As (310, 460, and 1000 msec) -^that were 
randomized. At the intermediate "SOA, which approximates the shorter SOA in tK^ 
present study, they found a 44-msec difference in "same" reaction times and^ a 
73-msec difference in "different" reaction times, both in.the predicted diffec- ^ 
tion. Moreover, the effect on "same" reaction times, but not that on "diffelPent" 
reaction times, decreased as SOA increased. This provides convincing evidence 
for the involvement of some auditory memory at short SOAs and for its decay over 
time. It also suggests that the effect on "different'' reaction times probably 
does not reflect auditory memory but differences in categorization time for the 
component stimuli. ' " ' 

- , /' 

, Eimas and»MilJ^er's study is elegant and well-designed, and their results 
must be taken seriously. It will require further research to clarify why the 
present study did 'not obtain the same effects, in the absence of any obvious 
flaws' in design. Of course, if the effect of "different" reaction times ^.s due 
to differences in categorization time alone, no effect should have been obtained' 
in the present balanced design because such differences cancel .out. Seen in this 
way, this portion of the present results even supports Eimas and Miller/ How- 
ever, the reason for the present failure to obtain an effect of acoustic differ- 
ences, on "same" reaction times remains obscure. 

160 . ' IGi 



REFERENCES 



Besner, D. and Coltheart. (1975) Same-different judgments with words and 

nonwords: The differential effect o*f relative size. Mem. Cog. 3> 673-677. 

Cole, R. A., M, Coltheart, and F. AHard, ^(1974) Memory of a speaker's voice: 
Reaction time to same- or different-voiced letters. Quart. J. Exp. Psychol. 
26, 1-7. 

Eichelman, W. H. (1970) Stimulus and response repetition Effects for naming 

letters at two response-stimulus intervals. Percept. Psychophys. 7, 94-96. 

Eimas, P. D. and J. L. Miller. (1975) Auditory memory and the processing of 
speech. In Developmental Studies of Speech Perception (Walter S. Hunter 
Laboratpry of Psychology, Brown University, Providence, R# I.), Progress 
Report No. 3,' pp. 117-135. 

Entus, A. and D. Bindra. (1970) Common features of the "repetition" and "same- 
different'* effects in reaction time experiments. Percept . Psychophys . 7 , 
143-148. . ' • . 

Fujisaki, and T. Kawashima. (1970) Some experiments on speech perception 
and a model for the perceptual mechanism. In Annual Report of the Engin- 
eering Research Institute (Faculty of Engineering, University of Tokyo) 
29, 207-214. . 

Pisoni, D. B. . (1971)* On the nature of categorical perception' of speech sounds. 

Unpublished Ph.D. thesis. University of Michigan. 
Pisoni, D. B. (1973) Auditory and phonetic memory codes in the discrimination 

of consonants and vowels. Percept . Psychophys . 13 , 253-260. 
Pisoni, D. B. and J, Tash, (1974) Reaction times to comparisons Vithin an^^ 

across phonetic categories. Percept . Psychophys . 15 , 285-290. 
Posner, M. I. (1969) Abstraction and the process of recognition. In The , 

Psychology of Learning and Motivation. Advances in Research and Theory: 

Vol. Ill , ed. by G. H. Bower and J. T. Spence. (New York: Academic 

Press). - 

Posner, M. I., S. J. Boies, W. H. Eichelman, and R. L. Taylor. (1969) Reten- 
tion 'of visual and name codes of single letters. J. Exp.' Psychol. 
, (Monograph Suppl.) 2£> 1"16. 

Posner,^ M. I. and S. W. Keele. (1967) Decay of visual information from a single 
letter\ Science 158 , 137-139. 

Poster, M. I. and. R. F. Mitchell. (1967) Chronometric analysis of classifica- 
tion. Psychol. Rev. 74, 392-409. 

Repp, B.^ H. (1975) Categorical perception, auditory memory, and dichotic in- 
terfex^nce: A "3ame"-"dif f erent" reaction time study. J. Acous.t. Soc. 
AmT; Supple 57, S51(A). 

R^PPjl. IT* (1976a) Effects oi fundamental frequency contrast on identification 
, and discrimination .of dichotic CV syllables at various temporal delays. 
\ mm. -.Cog. 4, 75-90. , * " . 

Repp, B. H. (197j^b) Identification of dichotic fusions. Raskins Laboratories 
Statug/Report on Speech Research SR-45/46 (this issue). 
, Springer, S. P.. (1973) Memory for linguistic and nonlinguistfc dimensions .of 
the same acoustic stimulus. J. Exp> Psychol. 101 , 159-^163. 

StAiddert-Kennedy, M. ,^ A. M. Liberman, and K. N. Stevens. (1963) Reaction time 
to synthetic stop consonants and vowels at phoneme centers and. at phoneme 
boundaries. , J. Acoust. Soc. Am* 35 , 1900. 

* 

■o - ■ •' ■ ■ 

161 

^ ' . 162 - 



Weak Syllables in a Primitive Reading-Machine Algorithm 
George Sholes v 



ABSTRACT 

Weak syllables are syllable types in the pronouncing dictionary 
of the reading machine. Weakened syllables, in the output string of 
the machine, come either from weak dictionary syllables or from fufl 
dictionary syllables that have been subjected to gradation. In 
eitjier case, weakened syllables are further subject to certain 
mergers and may exhibit special segmental allophones. Weakened syl- 
lables of all kinds may also condition shortening of the full sylla- 
bles they immediately follow.. This compression seems to come from a a 
kind of inclusion of 'the weak syllable by the full syllable. It does 
not occur acrosi^ phonological word boundaries and by this fact helps 
to identify phonological word boundaries in the output. 



Weak syllables, in this version of mechanical American English, are a 
special syllable-type which, among other things, typically comes to carry the 
lowest level of stress and so ends up at the bottom of ^he prominence heap. 
But weak and weat^ened syllables are also terms involved in a number of key oper- 
ations, among which are gradation, certain neutralizations ^ and the selection of 
special' segmental allophones. Finally, weak syllables c*o^dition a ^noticeable 
compression of full syllabLes they immediately follow. The Absence of sych com- 
pression, when a phonological word boundary intervenes, is a strong cue for the 
presence of the^word boundary.-^ ' * . 

\ • 

In. Section I of thid paper syllable-types in the machine^ will be outlined* 
and the operations of gradation, neutralization, and ^allophone selection will be 
identified. In Section II the*shartening ,effect of weak syllables on full syl- 
lables will be explored. Between the two sections a brief interlude will 
characteri;$e the machine itself including the pronouncing dictionary and ^phono- 
logiizal string, of whiqh weak and weakened syllable-types are parts. 



The phonological string of the machine is a hierarchical structure of segmen- 

tals» syllables,* phonological words, and phonological phrases (cf. Pikd, 1945, 

1967). What are called phonological w6rds here are called total contours in 

Pike (1945) and stress groups in Pike (1967). What are calJed t)honological 

* ^phrasers here are called rhythm units in Pike (1945) and pause groups in Pike 

(1967). What are called weakened syllables here are among those tentatively 

called ballistic syllable-types in Pike (1967:368-369). 
■ J t 

,.[HASKINS LABORATORIES: Status Report on Speech Research SR-45/46 (1976) J 



163 



163 



SECTION I 



Syllable Types 

In the pronouncing dictionary of the machine, phonetic entries are made up 
of combinations of three types of syllables* Weak , as a syllable-type in the 
dictionary, is illustrated by the last syllable of the followfiTg-pr^int-words : 
"soda, city, window, Hindu, beater, beetle, bottom," cotton', rotting*" The 
other two types of syllable in the dictionary are stressable and plain * Stress- 
able syllables are illustrated by the first syllable of the print-words in the 
list just giveiV* Plain syllables are those which never take stress (much less 
pitch-accent), on the one hand, and, on the other, are not subject to mergers * 
(neutralization) ; nor do they condition full syllable shortening* Illustrations 
of plain syllables are the first syllables of "ide^l" and of "psychology" and 
the last syllables of the verb "veto" (but not the noun) and of "telephone*" In 
sum, plain syllables — "Weal, tel ephone '^ — will never be stressed in*any text 
occurrence; neither will they be degraded, that is, replaced by schwa or a weak 
syllabic sonorant* 

It is to be noted that each print-word pronunciation in the dictionary con- 
tains at least one stressable syllable and that some pronunciations contain two 
or^ more stressable syllables* Examples of multistressable print words are 
"sardine(s)" and»y^stel(s)" (both syllables) and "intonation" and "California" 
(first and third syllables)* In citation pronunciation, because it means end- 
of-phrase, the last stressable syllable in a multistressable word would normally 
be stressecj (and get the pitch accent): "(can of) sardines," "(box of) p astels , 
"intonation," "California*" Within a phrase, an earlier s^ressab^e syllable may 
be stressed: " sard ine sand wich," "pastel pic ture," "intonation contour," 
"California suns hfne* " The number^of weak or plain syllables in^a dictionary 
pronunciation has no upper or lower limits* , . 

The distinction between stressable and stressed is thus one between diction 
ary pronunciation — stressable — and phonological string pronunciation — stressed* 
In the dictionary, stress is a potential of certain syllables; the stressable, *a 
potential which may or may not be realized^ in some Taccurrence in a phonological 
string* A similar distinction applies to weak syllables in the dictionary and 
actually weakened syllables in the phonological string*' By contrast, plain syl- 
lables in the dictionary carry over only into plain syllables in the phonologi- 
cal stress string* Figure 1 shows the possibilities*^ 

Gradation 

I* 

The dashed, line from stressable to weakened, which breaks a certain sjrm- 
metry in Figure 1, represents the working 'of the operation called gradation* 



The three syLlable-types in the dictionary correspond to the. three stress 
levels posited by Newman (1946), ^^^pi^^one moves Newman's sonorous weak in pre- 
heavy position to reading-machine plain * Component features that would define 
the four types in the phonological string, coul'd correspond witl> the first 
three suprasegmental features of Vanderslice and Ladefoged (1972).: plu^or 
minus heavy , accent, intonation . Correspondences can be made with other three- 
and four-way 'systems* 



164 



SYLLABLE-TYPES 
DICTIONARY PHONOLOGICAL-STRING 



stressable 
plai'n 

WEAK ^ . 




ACCENTED 



STRESSED 



^ WEAKENED 



Gradable syllables in the dictionary iriky be realized as stressed, plain, or ; 
weakened in the phonological string. By contrast, most stressable syllables 
may be realized only as stressed or plain. Gradation applies to ^ small number 
of monosyllabic structural words, such as "of," "at," "do." Only some four 
dozen dictionary .^words are subject to gradation, but they are ^11 very frequent 
text words. When a gradable syllable does^ppear in weakened form, it behaves 
like weakened syllables which come* in ^the fijsual way from dict;ionary weak sylia-'' 
bles: a syllable weakened by gradation is just like any other weakened sylla-*- 
ble.3 . ^ • ' p * 



For«,ea6e of exposition, ^t is useful to have a cover terifi for nonweak or 
nonwe&kened syllables. * Full syllable will be the label that includes stressable 
and plain, or stressed and plain* syllables. 

' ■ J "■' 

. Allophone"" Selection 

V/hen print words are strung tdgether, consonant segmentals may come/togethei 
Sat print-word boundaries. The^e consonant clusters may be smoothed out by re- 
duction (dropping) or by altering component features when the syllable-type 
sequence aver the (PrinJ: -word boundary is full-plus-weak. For example, the ^ 
print word "miss" is stored in the dictionary with Ihe citation pronunciation 
['mis] and the (gradable) print^wgrd "you" with ['yu"^]. Yet the print-word 5^ 
quence "miss you," -particularly in a larger context;, such as "I'm. going to miss 



S^e,^ for instance, Kenyon (1950r lOA-114) an^d Gimsori ([1^64:2^9-243), 



165 



\ 



you a Tot," will give the phonological string fragment ['mijw]. This assembled 
fragment is quite similar to the string representation of the single print word 
"issue" ['ijw] in the same context: "I'm going to issue a lot." It will be 
seen that the print-word boundary in the vitinity of the (de) graded and weakened 
syllables of structural words may be* heavily^amouf lag'ed. 

A number, of single consonant? have special allophones in the position be- 
tween full and weak syllabics ("intervocalic, i^osition") , fbr example, [t, d] are 
flapped and [g] appears as a fricative. The special allophone is selected re- 
gardless of where the print-word (lexical) boundary falls. For instance, the 
fragment [mg^dn] can represent the first two words of "made. in France," with 
print-word boundary on the right-hand side of the [d], or it can represent the 
entire word "maiden" with no print-word boundary at all abutting the [d]. Sim- 
ilarly, the fragment [bi^kn] could represent all the print-word sequences, . 
"beacon," "bee can," ''beak and," embedded in soiae larger context. (This is not 
to say that the print^woxd sequences cannot be distinguished, but rather that 
they may not be.) 

Neutralization 

^ ~ Syllables may also be weakened — carried into the phonological string as 
weakened syllables — by neutralization or merger of syllable-center tambers. For 
example, the syllable centers of the dictionary weak syllables of "windows" and 
"Hindus" merge into a single tamber when those weak syllabijvs turn up in various 
nonfinal contexts, such as:^ • ' . ' * 

All the windows are here. 'oiaaWindwzr'hir/,/ 

0 0 0 

♦ f 

All the Hindus are here." 'ol53'hindwzr'hir// 

A 0 0 0 

Whereas in various final /contexts, the syllabics of these print words are quite 
distinct (and in the example below the dictionary weak syllables have been 
assembled as pl^in syllables) : , , * 

* Here are all the windows*. 'hirr' ol5a'win,do^2// 

'•.'■00 

Here are All' the Hindus. Vhirr'ol5a'hin,du^z// 

2 '.00 

In natural speech, the merged syllable [w] would have a tamber range overlapping 
part of full syllable [li^ u"] and perhaps [o^].^ In sum, the allophone range of 
certain weakened syllatbids differs from the corresponding full vowel ratige. 

Similar contexts ctie the merger of dictionary vowels [a] and*[i]. For ex- 
ample, the print wprds^-'bim'^ and "them" are indistinct in: . , - 

I can see him now./ 

• * / . .aykn'siym'na^/A ' ' ■ 

I can see 'em now. 



4 



A small circle below a let\ter has ^b6en used to indicate a weakened-syllable 
center: , [a ^ w r. 1 m n q] .j Alternatively (and equivalently) , the same weak- 
ened-syllafile centers could be written schwa ^or schwa plus sonorant consonant: 
[a ay aw ar al am an aq] • , ^ -# 

See, for instance, Kingdpn (1969:10)' and Bolinger (1963:22). 

^66 , • 166 



ERIC 




and are distinct in: 



Now I can see him. 'na'^taykn'siy, im// 

0 

Now I ean seq 'em. 'na*^, a^kn' si^ , Am//* 

In th^ end, the list of weakened syllables (vowels) in nonfinal position in 
the assembled phonological storing is'fy^ti^gj- For this and other rea- 
sons it has from time to timfe been proposed that weak-syllable centers are best 
taken as forming a s^arate /system apart from the larger, main system of full- 
syllable vowels (e.g* HultL&en, 1961; Bolinger, 1963), or that they are position- 
al variants of the sonorant consonants (e.g.. Householder, 1957). In the read- 
ing machine, however, it proves useful to hav6 just one set of syllables (vow- 
els) and to have the syllable as a whole marked for its type.^ 



The notation convention for marking syllable types is that full syllables 
are marked where they begin, while phonological words and phrases are marked 
where they*end. Weak an,d weakened syllables are not consideted to have bound- 
aries of their own at all. By this means all distinctions of the kind "gray 
day" versus "grade A" and "a nice ..." versus "an ice ..." are automatically 
assembled. .(See Jones, 1931, 1956; LehistTe, 1960; Hoard, 1966;- Lee, 1970.) 

However, this style of marking also requires that the syllable centers. of, 
"hot" and "heart" be" written with different symbols. This is because the" full 
vowel of "hot" may, in the assembled string, be followed by [r] and then a 
weakened syllable. It must still remain distinct from the full vowel of "heart" 
plus [r] plus weakened syllable. A test pair would be: 

bas ^relief vs. bar ^a leaf 

which can be h^ld separate when pronounced with phonological word boundary at 
the points shown. When the boundary is omitted (with, concomitant full-syllable 
compression to the left; see Section II below), the phrases are still distinct: 

bas-relief ^ bar a leaf- 

'bara'liyf// 'bars'li^f// 

Similarly, with phonological word' boundary omitted: 

J" 

Ma renewed mar a nude • , • 

'marVn^d// ' 'mara'nu^d// 
and also: 

paw repair ^ pour a pair . ' 

^pora.per// 7^ 'por3,per// 

It is nonetheless possible to wjrite the syllable center of '.'bird" either 
as a unit — [^] — or as a* sequence of wedge plus [r] — [Ar] — with nd contrastive 
difference. Full-syllable wedge will never otherwise be followed by, [r] in 

. ...^ 167 

lo/ 



t 



MACHINE INTERLUDE 

♦ With this much of a sketch of weak syllables and weak syllable operations, 
the reading machine itself can be characterized in general terms. It is an 
algorithm and a laachine in the sense that it is a series of computer programs, 
"it reads in the sense that it, together with the hardware attached to it, con- 
verts strings of print representations into an acoustic signal that is a simula- 
tion of speech. Finally, it is primitive in that a human editor is asked to » 

intervene at one point to a^d information that is not available automatically. 

- — ■ * 

Schematically, the machine moves from print text to synthetic speech in two 
.large steps, as shown in Figure 2. First, the print text is turned into a phono- 
logical string; then the phonological string is converted into parameter frames 
that drive an 'electronic synthesizer; the output of which is an audio signal 
that can be h6ard as speech. . ' 

The first step converts the print text into a phonological string. This 
involves chunking the print text up into print words, then replacing the print 
words by their dictionary pronunciations, and then reassembling the text. At 
the end of this first step, the text appears in a phonetic notation where orig- 
inally it stood in ordinary English spelling. 



Reassembling the text after the dictionary look-^p is a procedure of some 
complication. The vowel mergers and consonantal simplifications suggested In 
Section I above are an important part of reassembly. The dictionary look-up, by 
contrast, is quite pimple. The dictionary is presented with an orthography, 
such as "cat , "thereupon it returns ['kaet] plus the tag for open-class words. 
In this way the dictionary provides the segmental phonemes and the basic syllable 
structure of the phonological string. The rest is up to the editor. He marks 
for phonological words and phrases; and, since these carry the intonation, the 
intonation. The editor is thus standing in for what appears. to be a syntactic, 
semantic analysis of the print text. He is also carrying out certain independent 
phonological decisions. - 



this kind of Mtftican English. Schwa plus [r] may occur in weakened syllable 
at print-word bbundary joints. When this happens, schwa plus [r] will not con- 
trast with syllabic [t] in a weakened syllable. A test pair, with phonological 
word boundary included, would be: 

rows are applied vs. Rosa ^replied 

When the boundary is omitted, the' two phrases fall together and are indistinct: 

'ro^zra'pla^d// « 'ro^zBra'playd// 

and in other such instances, .sequences of weakened schwa plus sonorant are 
taken as equivalent to the -syllabic sonorant alone. 

^Thiar characterization of the machine is not only general, it is idealized. In 
particular, the irftroductiqn of the editor can be taken as an expository device. 

168 ^ < « o - 



PRINT-TEXT 



editor 



phono! ogi cal -phrases 

phonological-words 

intonation 






syllables 


+ 


segmentals 


J 


.tag ' 



PHONOLOGICAL-STRING 

synthesizer 

SPEECH 



SECTION II 

This section outlines an operation called compression, full-syllable com- 
pression, and it is an adjustment of durations. The units to be adjusted are 
full syllables, both stressed and "unstressed, and the essential context for the 
adjustment is provided by weak syllables and phonological-word boundaries.^ 

Other things being equal, the most powerful of the interdepending cues for 
prominence is generally taken tp b^^ literal length: duration in time (Fry, 
1970). Compression has the curious effect of tnaklng a full syllable salient by 
shortening its duration. The most complete description of this effect has been 
given by' Bolinger (1963, 1965). 

Consider a phrase consisting entirely of full syllables, that is, devoid of 
weakened syllables: 

'YOU- .MAKE 'BILL ,L00K 'GOOD // 

It is generally possible to insert a weakened syllable into such a phrase with 
absolutely no increase in overall phrase duration. In fact, the new phrase is 
just as long as the original. The definite article "the" will do for insertion. 
It gives: 



8 

What are called phonological-word boundaries here are called intonation breaks 
in Pike (19A5). See also the discussion of Solutions A, B, and C in Pike' 
(1967:4057409). 

^ • 169 

. 169 ^ 



'YOU .MAKE THE 'BILL .LOOK 'GOOD // 

The indefinite article and certain possessives, all as weakened syllables, do 
^the same: ' ' 

'YOU .MAKE A 'BltL^ /LOOK 'GOOD // ^ 

HE.R 

Inserting a full syllable rather than a weak syllable does not give the 
same result. The phrase becomes not only longer in segmentals and syllables, it 
also becomes longer in total duration. The demonstrative "that" will do for 
full-syllable insertion. It gives: * 

.YOU 'MAKE .THAT 'BILL .look ' 'GOOD // 

When a weak syllable is inserted, something in the original phrase is com- 
pressed to make room for it. When a full syllable is inserted, this compression 
does not occur. What gets compressed when a weak syllable is inserted is the 
full syllable to the left of the weak syllable. In these examples, this is the 
print word "make": it is compressed ^in the fragments: "make the bill, make 
'er bill, make a bill"; "make" stands at its normal length in the fragments: 
"make bill, make that." 

Bblinger is at pains' to point out that compression or its absence is inde- 
pendent of I(mmediat,e) C(onstituent)-cuts. The articles, demonstrative and 
possessiyes go syntactically with the next item to the right, the print word 
"bill": "a bill, the* bill, that 'bill, her bill." As weak (and then 
weakened) syll^^es, they nonetheless compress the syllable to the left, "make*" 
In short, compression is determined phonologically rather than syntactically. 

'Compression is obligatory in the sense that failure to compress a full' 
syllable in this context tends to give a stage (stereotyped) Scandinavian accent 
and' pronunciation guides intended for Scandinavian learners 'of English often 
explicitly point out this potential stumbling point (e.g., Lewis., 1969:50-51). 
Full-syllable compression is obviously.no language universal, ahd this suggests 
that it is not even a universal for languages that have, stressed syllables, as 
do the Scandinavian. 

By way of parenthesis, it is worth noting a possible articulatory explana- 
tion for full-syllable compression. ^Ladefoged (1962), attempting* to correlate 
intercostal muscle activity vith Stetson's (1951) chest-pulses, noted that, cer- 
tain syllable sequences ^ay be articulated on a single burst of intercostal 
activity, even though the usual pairing is one chest-pulse/one syllable. He 
cites the word "pity" as an example, and the word "doddered" in. his Figure V . 
appears to have been articulated this same way. 

•To put is metaphorically, a full syllable in English attempts to include 'an 
immediately following w6ak syllable, include it in the^same production gesture. 
There is, perhaps, a parallejL with syllable-closing consonants which are also 
not in their most natural place at the end of a syllable. Consonants naturally 
begin syllables.; In this sense, both syllable-final consonants and included 
weak syllables wpuld be unnatural phonological structui^es, and of course both ^ 
shorten the 'segmental substance that precedes "in the same syllable." 

170 * . • ■ < ' 



What is the magnitude of compression? Lehiste (1971) has published measure- 
ments in phrase-final position, that is, where compression is combined with 
phrase-final length adjustments (and those of intonation as well). She compared 
pairs such as "stead, a full syllable, with "steady," f ull-plus-weak. In this 
position, with such pairs, the single syllable actually averages out longer in 
duration than the whole compressed sequence • Not all components were equally 
compressible. The full vowel is most: amenable to compression. Differences be- 
tween regular ^nd compressed vowel lengths are somewhat greater than two to one. 
The leading consonants are most resilient, though nonetheless affected. ' Every 
element in the compressed syllable is compressed to some degree. 

Bolinger (1963) maintained that compression is independent of IC-cuts, 
independent of the syntax. *^In a British tradition, compression is treated as a 
correlation between the lexicon and the phonology. Abercrombie (1965) has given 
an exposition from this point of view. In the R(eceived)*P(rohunciation) of British 
English, he notes (or perhaps declares — see Uldall, 1966, 1971) that the spac- 
ings between stressed-syllable onsets are "of (approximately) even J.ength": RP 
stresses are isochronous. Yet given the roughly constant durations between 
stressed onsets, the included segmental meterial may be divided over the avail- 
able time in different ways. Here he gives the classical 'contrast: 

dake Grey to London ' vs. take Greater ^London 

In the phrase on the left, Abercrombie stated that the relative lengths of the 
syllables "Grey" and "to" ^are on the order of two to one, whereas in the single 
word "greater" the relative ^syllable lengths are on the order of one to one. 
For a comparable contrast with the segmentals of American English, there is: 

* the rush and turmoil vs. the Russian ' turmoil 

i > 
. Inisiim, full-syllable compression on the left-hand side of these contrasting 

pairs has been blocked by an immediately following word .boundary. So an effec- - 

tive cu§ jEor the presence, of this word boundary would be the sequence full plus 

weak 'syllable with aii uncompressed full syllable. 

Abercrombie wanted to relate (.what is here called) compression to the lex- 
ical' composition of the phrase. Certain structural woirds (proclitics in the ^ 
examples above: "to," "and") are not independent words at all: . they merge 
phonologically into -their neighbor^. , But this way of looking at things as lex- 
ically determined, apparently, leads to overlooking yet a third possible way of 
distributing the same segmental material between two stressed onsets, to wit: 
with no included phonologiqal word boundary at all. 

The contrast of presence versus absence of phonological word boundary be- 
tween two stressed onsets is demonstrated by Pike (1945:37^ 1967 : 385)^ with twp 
versions of the. ^rint phrase "a book of stories": , 

a book of stories vs. .a book of stories 

Since Pike actually recorded these examples when the earlier book appeared, 
it is possible to measure his segmental durations. The difference in compres- 
sion is as clear to the tape. measure as it is to the ear* The full vowel of 
"book" followed by the boundary is about twice as long as the same full vowel 
' followed immediately by the weak syllable "of." But the upshot of this is that 

171 

4 

171 



the absolute durations between stressed onsets in th^se two versions of "a book 
of stories" are distinctly different. At this level .of detail/ at least, 
English is not literally isochronous. In fact, a phonological word boundary 
gives what Householder (1957) calls "a significant rhythm break," and if that is 
so, we would expect the different overall durations we do indeed find. 

So a third version of the. Abercrombie and American examples is possible, 
this time without any included phonological word boundary, and it will be not 
only shorter in total duration, but lexically ambiguous as well; 

" take Grey to London = take Greater London 

'teyk'greyta'lAridn// = , te^k' greyte' lAndn// 

0 0 

and 

the- rush and tuphoil = the* Russian turmoil 

aa' rAjn' ta^.j-moyi// = Se' rAfn' ta^.moyi// 

0 0 0 0 

I suspect this is the usual Way. of saying these phrases when the print words 
"greater"* and "Russian" are usfed, despite the ambiguity. 

Now to these versions can be immediately added yet a fourth in which the 
weak syllable previously included is left out. Over the fragment of interest, 
we will now have stressed-plus-stressed, where before we had stressed-plus- 
weakened-plus-s tressed. Some of these truncations will be nonsense sequences, 
but no ^matter: 

take Grey London 

y the rush turmoil 

a \?ook - stories 

The uncompressed syllables "Girey," "rush," "book" followed by phonological-word 
boundary here ate quite comparable in length to their 'other occurrence followed 
by phono logical" word boundary f 

take ptey to London . 

' the Vush ^and turmoil * ^ , 

V 

a book ^of. stories. 

,To put it/anothef way, when compression is blocked by a phonological-word 
boundary,, the ongoing calculations for segmental durations would be caught up to 
that point: there do not seem, tp. be durational dep,endencies of this kind run- 
ning over "the, phono logical-word t/oundary.^ 



Phonological-word boundaries are independent of ^^lexical word boundaries, 

though they frequently coincide. It is to be no tied that a phoQological-word 

boundary may appear in the middle of a single lexical item, proj^ided the dtem 

172 * ^ ' • • ^ • ' " 



ERIC 



SUMMARY 



Pronunciations from a dictionary look-up on a print text are reassembled 
into a phonological' string which is -then converted into synthetic speech. The 
phonological string is a hierarchical structure. based on segmental phonemes 
which are grouped into syllables, phonological words, and phonological phrases 
by boundary marks inserted among the segmentals. Full syllables are marked 
where they begin; words and phrases, where they end. Weak syllables are taken 
to have no inherent boundaries at all. They may be "included" in adjacent full 
syllables by effects of compression and neutralization which simultaneously 
give the including phonological-word characteristic features of its prominence 
silhouette. . • ^ ' . 

REFERENCES 

« 

Abercrombie, D. (1965) Syllable quantity and encldtics in English. In 

Studies in Phonetics and Linguistics . (London: Oxford University Press), 
pp. 26-34. " ' * • ' 

Berger, M. D. (1955) Vowel distribution and accentual prominence in modem 
English. Word 11, 361-376. * , _ 

Bolinger, D. L. IT963) Length, vowel, juncture. Linguistics 1, 5-29. 

Bolinger, D. L. (1965) Pitch accent and sentence rhythm. In Forms of English . 
{Cambridge, Mass.: Harvard University Press), pp. 139-180. 

Fry, D. B. (1970) Speech recognition and perception. In New Horizons in 

Linguistics , ed. by J. Lyons. (Harmondsworth, England: Penguin) 29- 
52. 

GimsOn, A. C. (1964) An Introduction to the Pronunciation of English . (London: 
Edward Arnold) . 



is realized ^^ith two stressed syllables. Any multistressable word will lend 
itself to this kind of realization and no more so than in ultracareful citation 
form. Thus we have double-stressed versions, with included phonological-word 
boundary, of "sardine" and "absolute^ly": 

f 

'sar 'diyn// 'aebsaj lu^tly// . ^ * 

and double-stressed v.ersions without phonological-word boundary: 

'sar'diyn// 'aebsa' lu^tly// ' ' ^ 

The most usual versions retain only the last dictionary stress": ' 

,§ar'diyn// .aebsa' lu^tly/7 ^ 

(See Pike, 1945:77.) " 

Berger (1955) notes several examples, particularly from advertising and ' 
comic strips, 'Where this incipient ambiguity among. print words and print phrases 
has been exploited: "(:hip 'nDale, Etta Kett, K-9 Corps," etc. A phonological- 
word boundary is presumably more likely than not to correspond,^ to a lexical 
boundary, just ^s a consonant is more likely to begin a syllable than is a 
vowel. . Absolutely, however, the occurrence of a consonant does not establish a 
syllable boundary and the occurrence of a phonological-word boundary does not 
establish lexical boundary. In this sense the phonology is lAdep^t^ent of the 
lexicon, though closely related to it. X 

. ^ 173 ' 

173 . . - 



ERLC 



Hoard, J. E. (1966) Juncture and syllable structure in English. Phonetica 15, 
96-109. 

Householder, F. W. (1957) Accent, juncture, intonation, and my grandfather's 

reader. Word 13 , 234-245 
Hultzen, L. S. (1961) System status of obscured vowels in English. Language 

37, 565-569. 

Jones, D. .(1931) The word as a phonetic entity. Le Matt re Phone tiqu e 34, 60r- 

65. • 

Jones, D. (1956) The hyphen as a phonetic sign* Z. Phonetik 9, 99-107. 
Kenyon, J. S. (1950) American Pronunciation . (Ann Arbor: George Wahr) . 
Kingdon, R. (1969) Grammar of Spoken English , ed. by H. E. Palmelr and F. G. 

Blandford. (Cambridge^ England: Hef f er) , vol. 10. . 
Ladefoged, P. (1962) Sub-glottal activity during speech. In Proceedings of * 

the Fourth International Congress of Phonetic Sciences ^ ed. by A. SovijSrvi 

and P. Aalto. (The Hague: Mouton) , pp. 73-91* 
Lee, W. R. (1970) Noticing word-boundaries. In Proceedings 'of the Sixth 

International Congress o& Bhonetic Sciences , ed. by B. Hdla, M. Romportl^ 

and P. Janota,. (Prague: Academia) , pp. 535-538. 
Lehiste, I. (1960) An acoustic-phonetic study of internal open juncture. 

Phonetica, Suppl. 5i . 
Lehiste, I. (1971) Temporal Organization of Spoken Language , ed. by L. L. 

Hammerich, R. Jacobson, and E. Zwirner. (Copenhagen: Akademisk Forlag), 

pp. 159-16.9. 

Lewis, J. W. (1969) Guide to English Pronunciation . (Oslo: Universitets- 
forlaget) . 

Newman^, S. S. (1946) On the stress system of English. Word 2, 171-187. 
Pike, K. L. (1945) The Intonation of American English . (Ann Arbor: University 
of Michigan Press). 

Pike, K. L. (1967) Higher-layered units of .the manifestation mode of the utter- 
ance (including the syllable, stress group and juncture). In Language in 
Relation to a Unified Theory of the Structure of Human Behavior , 2d ed. 
(The Hague: Mouton), chap. 9, ppr 364-432. 

Stetson, R. H. (1951) Motor Phonetics , 2d ed. (Amsterdam: North Holland). 

Uldatl, E. r. (1966) English ftP. Le Maitre Phonetigue 126 , 34. 

Uldall, E. T. (1971) Isochronous stress in R.P. In Form and Substance , ed by 
L. L. Hammerich, R. Jacobson, and E. Zwirner. (Copenhagen: Akademisk 
: Forlag), pp. 205-210. 

Vanderslice, R. and P. Ladefoged,. (1972) Binary suprasegmental features and 
transfoi?mational word-accentuation rules. Language 48 , 819-838. 



174 



174 



/ 



-4- 




Control of Fundamental Frequency, Intensity, ^and Register of Phonation* 

+ + ++ 

Thomas Baer, Thomas Gay, and Seiji Niimi 





STRACT * 



Electromyographic activity of several intrinsic and extrinsic 
laryngeal muscles was recorded as untrained singers produced system- 
atic changes in fundamental frequency (Fq) , intensity, and register 
of phonation. For one subject, sub^lottal pressure was recorded 
simultaneously. Cricothyroid muscle activity varied most consistent- 
ly with Fq over most of the range of Fg, although the activity of 
setveral other muscles was also related to Fq. Vocalis muscle activ- 
ity varied most consistently with the shift between chest and falset- 
to registers. Subglottal pressure varied consistently with changes 
in vocal intensity. Activity of the extrinsic Muscles was correlated 
with Fq at both the high and low extremes of the chest voice range. 
Par at least one subject, the extrinsic muscles seemed to be solely 
responsible for varying Fq at its low extreme. The activity of 
muscles not directly associated with the larynx also changed system- ^ 
atically with Fq at the high extreme. 

Recent electromyographic (EMG) 'studies of the control of fundamental fre- 
quency, intensity,^, and register of phonation have dealt with the intrinsic laryn- 
geal muscles (e.g., Hirano, Ohala, and Vennard, 1969; Hirano, Vennard, and Ohala, 
1970; Gay, Hirose, Strome, and Sawashima, 1972) or with the extr-insic muscles 
and subglottal pressure (Shipp and McGlone, 1971). Simultaneous recording of in- 
trinsic and extrinsic laryngeal muscles and„ subglottal pressure has been reported 
for speech intonation (e.g. , Collier, 1975) but not for singing. Thus,* the pur- 
pose of[>this study is to reexamine the nature of the control of phonation by the 
intrinsic^and extrinsic muscles of the larynx and by subgj-ottal pressure. 

For this study, four untrained singers produced systematic changes in fun- 
damental frequency (Fq) , intensity, and register of phonation while EMG activity 
was 'recorded using hooked-wire electrodes (Basmajian and Stecko, 1962; Hirose, 
1971) • For subject TB, subglottal pressure was also measured, using a cannula 



*Paper presented at the 90th meeting of the Acoustical Society of America, 
San Francisco, Calif., 3-7 November. 1975. - . ' * 

'*'a1so University of Coftnecticut Health Center, Farmington. % 

^On leave from the University pf Tokyo, Japan. 

Acknowledgment : This research was supported by NIDR grants 5T22^ DE00202 and 



[HASKINS LABORATORIES: Status Report on Speech Research SR-45/46 (1976)] - 



DE01774. 



175 



173 



inserted through the cricothyroid space. Each note was produced on the sylla- 
ble /bi/. Each vocal maneuver was repeated 10 to 15 times, and average results 
were calculated using the Raskins Laboi^atories EMG data processing system 
(Kewley-Port, 1973). This system computes average activity from several repeti- 
tions of an utterance as a function of time offset from a predetermined lindup 
point associated with each token* 

Figure 1 shows a typical result. The subject produced one-octave arpeggios 
starting from a fundamental frequency in .the middle of his chest-voice range. 
The arpeggios were performed at three different intensity levels. Average 
activity was calculated for each of these conditions using for lineup point the 
onset of voicing for the first (lowest) note (shown on the left-hand side of the 
figure) and also using the onset of voicing for the fourth (highest) note (shown 
on the right-hand side of the figure). 

Average activity o^ the ^cricothyroid (CT) and vocalis (VOC) muscles was 
found to vary systematically with fundamental frequency, but not with intensity. 
(Activity of the VOC' muscle was sometimes more closely correlated with Fq than 
is shown in Figure 1). Subglottal pressure varied systematically with intensity 
(or vocal effort) , but its variation with frequency was smaller and less system- 
atic. This close correlation between subglottal pressure and intensity is 
qualitatively in agreement with the results of other investigators (e.g., 
Isshiki, 1964). We plan to investigate the* relationship between subglottal 
^ pressure and fundamental frequency in more detail in the future. ' 

Figure 2 shows similar results from subject KK — a female. " Two lineup 
points have been used, and the results have been superimposed in their overlap 
region. Cricothyroid and VOC activity vary systematically with fundamental 
frequency but not with intensity. Activity of two extrinsic muscles., the thy- 
rohyoid (TH) and the sternohyoid (SH) , is shown.. The pulsatile structure of the 
TH plots shows that its activity is related to the segmental gestures for pro- 
ducing the syllables. However, the symmetric envelope of activity centered 
about the second lineup point shows that its level of activity is also related 
to Fq. The plots of SH activity show tendencies similar to those .of the TH, 
though they appeai: less dramatic in this run. ^ The TH activity shows some dif- 
ferences in activity for the highest intensity condition. 

In several runs, EMG activity was recorded from the inferior, constrictor 
muscle. The electrodes were directed toward the cricopharyngeal part of the 
muscle, and these placements were verified using activity (during swallowing. 
The results were inconsistent across subjects. In Figure 3, the upper plots 
show the inferior constrictor data corresponding to the data in Figure 2. The 
only increases in- activity are associated with the first note and the last note. 
This activity appears to be related to the production of the lowest frequencies, 
although it could also be related to maneuvers associated with the beginning and 
end of the phrase. These two interpretations could be differentiated by per- 
* forming descending-ascending rather than ascending-descending arpeggios in the 
same range, byt such maneuvers were not performed. The lower plots in Figure 3 
show the inferior constrictor activity .corresponding to the plots in Figure 1. 
Here, inferior constrictor activity increases with both Fq and intensity except 
for the high intensity Gondition,. for which there is an increase of activity 
associated with the first and last notes. For the other two conditions, there 
is a decrease of actiyity immediately before the onset of the first note, and a 
small increase of activity at the end of the phrase. The meaning of these results 

^ 

176 * • 

♦ 

ERIC 



GO 
\- 
OsJ 



N 

X 

o 

OsJ 
OsJ 

I 

o 



CO 

o 

(3 
(3 
UJ 
Q- 

< 



o 
o 

UJ 
CO 
I 

o 





- > 



Q 

N 

u 

LU 
(/) 

r— I 
O 

LU 



001 xaH 



PQ 
H 

u 

O 
Q) 

^■ 



Q) 
> 
Q) 



CO 
Q) 



Q) 
Q) 

4J 

cd 

CO 

o 

60 
0) 

cd 

60 
0) 

to 

Q) 
M 

*a4 



4J 

o 

•a 

CO 



o 

i 



01 

U4 



ERIC 



Figure 1 
'177 



177 



0 




i 

Figure 3 ■ \ ' 



X 



is unclear, and must be further investigated wijih repeated insertions on the 
same (and other) subjects aT\d with other vocal maneuvers. . 



r 

We reconfirmed the well-known fact that extrinsic n^scle activity c^ntrib- 



ute^^o the control of Fq at both, extremes of a subject's .chest-voice range 
(e.^; '^onninen, 1956). Results from the low extreme are shown in Figure 4. 
The subj,ect produced an ascenditig scale at the rate of one note per second 
st^arting at about his lowest note. Average activity of each of four laryngeal 
muscles and of sutglottal pressure was measured for each note and plotted as a 
function of the fundamental frequency of the note in the figure. As the figure 
'shows, there was no significant change in CT or VOC activity for tshe lowest 
notes, and subglottal pressure was held fairly cons^tant throughout. However, 
there were clearly changes in activity of the two strap muscles — the , sternothy- 
roid (ST) anS thyrohyoid (TH) — for the lowest notes/ Although we had no reli- 
able insertions into muscles other than the^ ones shown in Figure 4, it seems 
reas^onable to conclude that the ST and TH, and possibiy other extrinsic muscles, 
were responsible for producing the lowest fundamental frequencies. This result 
is of interest for both singing and speech, since the low extrtime of the Fq 
range for singing overlies^ the range of Fq commonly used for speech. 

• At tjie high extreme, we e:^amined the control of register for subject TB, who 
could reliably produce the same note in either chest-voice or falsetto. The re- 
sults of shifting from falsetto to chest-voice on three different notes are 
shown in Figure 5. The subject sAng th^ syllabize /bi/, first in fal.se^to and 
then in chest-voice. The lineup point for averaging was the onset of the chest- 
voice note. The plots on the left-hand side of the figure show the activity of 
the CT and *VOC muscles and of subglottal pressure. The plots on the right-hand 
side pf the figure .show activity pi the interior constrictor (IC) muscle and one 
strap muscle, the TH. In all cases, the activity of the VOC muscle' was greater 
in chest-voice than in fals,atto. The level of CT activity increased at* the 
shift from falsetto to chest-voice for the 220- and 330-Hz notes, but there 
w^s only a very small^increase for the 440-Hz note. The TH shows no change of 
activity for the lower two notes, but an increase of activity for the shift into 
chest-voice in the highest note. These results are consistent with the notioA 
fhat the VOC muscle is most closely associated with the control of register, 
[while the CT and strap muscles produce compen"tfatory activity to regulate funda- 
mental frequency. Both subglottal pressure and IC activity consistently in-' 
creased during the shift from falsetto to chest-voice. The significance of this 
increase is difficult to assess, especially since intensify was not controlled 
in these maneuvers. Although the results are not shown here, equivalent results 
showing a general decrease of activity were obtained when the shift was made 
from chest-voice to falsetto. ^ * . . 

* . ■ . ' 

Figure 6 shows* a plot of intrinsic muscle activity at the high extreme' of' 
the ch*est-voice grange for .subject SN. The subjfeqt produced .ascending scales at 
the rate of one note per second, and average acti\jity for each t^ote was plotted 
as a function of the Fq of the note, as In Figure 4« in addition to the increas 
of CT and VOC. activity with fundamental frequency, both the lateral cricoaryten- 
oid (LCA) and posterior cricoarytenoid (PCA)^ muscles showed some in cr^ (ease of 
activity with fundamental frequency.. Although we were n6t fortunate in achj.ev- 
ing good FCA insertions, this figure shows at least bne ex.ample in which there 
was a small but ^systematic increase in PCA activity at the high Fq extreme. 
Such a result was reported by Gay et al. .(1972), but was not^evid^nt in the data 
of Shipp and McGlone (1971). ' / ' 



H 



O^H <JLJO u! 8jnss8Jd 

CO »o_ ^ 



a 

CO 




N 
I 



O 

c 

<D 
CJ 

LL 

c 
0 

E 

C 
LL 



/\rl ui Ai!A!i3V Ol^j3 



a 
w 



Figure.^ 



0) 



181 



181 



O 

> 
I 

(/) 
0) 

a 



CO 

I— 

CsJ 



z: 
o 

LU 



CD 
J/) 



X 
CO 









• o 

CO 

0) CO 
O dJ 

C D 

-a oj 

(1) (0 

> 

U (1) 

d -to 

O U 0) 
(0 

. CO I 
u-r iH 

0) 

CO <U <U 
•H C O 
OO (1) iH 
0). CO 00 
U <U.^ 

OO CO 
C (1) w 



CO 0) 



0) 

CO (0 

(1) 0) :^ 
CO c: o 

CO (1) 
(]) U tfi 

U' c^ u 

4J •H 

<U CO <U 

o u c 

iH (1) (1) 
OO^ U 

o :i u 
CO c: o 

c ^ o 

>» ^ 

•H « CO 

> H 
•H iH 

•U 4J O 

o u,> 

(0 0) o 
•n u 
o ^ u 

CO 6 



0) 
OO 



s 



182 



ERIC 



Figure 5 

182- 



/ 



■ " ' • ' - ' 

A final point is made in Figure ?• In an otherwise unrelated experiment in 
'Which insertions were made into several muscles of the tongue and pharynx as 
well as the LCA, subject TB produced some systematic fundamental frequency 
changes. This figure shows the EMG activity of several muscles — the lateral 
cricoarytenoid tLCA) , levator palatini (LEV), styloglossus (SG) , inferior 
-longitudinal of the tongue (XL), mylohyoid (MH), inferior constrictor of the 
pharynx (IC) , superior constrictor (SC)^ and genioglossus (OCX — during arpeggios ' 
-in the high exftireme of , the subject's range. The lineup point is the onset of 
'phonati-on of the .highest note. Although the activity of several muscles is 
correlated with fundamental frequency, at least some of these (such as the LEV 
and the intrinsic tongue muscles) are sufficiently unrelated to the larynx that 
they are unlikely, to directly affect Fg, Rather, they seem to reflect a general^ 
increase in muscle activity in the head and neck when "reaching" for the highest 
notes. A;.though this is an Wtreme example, it might serve^ 'to warij that caution 
must be observed in the interpretation of EMG results, especially when trying to 
impute cause-aiid-ef feet between the action of a specific muscle and a. specific 
acoustic *reslalt, . ^- ■ , 

> 

~— REFERENCES 

Basmajian, J. and G. Stecko. (1962) A new bipolar indwelling electrode for 

electromyography. J. Appl. Physiol. 17 , 849. 
Collier, R. (1975) Physiological correlates of intonation patterns. J. Acoust. 

Soc. Am. 58, 249-255. 
Gay, T., H. Hirose, M. Strome, and M. Sawashima. (1972) Electromyography of 

the intrinsic laryngeal muscles during phdnatiori. Ann. Otol. , Rhinol., 

Laryngol. 81, 401-409. 
*Hirano, M. , J. Ohala, and W. Vennard. (1969) 'The function of the laryngeal 

muscles in regulating fundamental frequency and intensity of^ phonation. 

J. Speech Hearing Res. l2 , 616-628. 
Hirano, M., W. Vennard, and J. Ohala. (1970) Regulation of register, pitch, 
. and intensity of voice. Folia Pboniat. 22 , 1-20. ' ^ 

Hirose,, H. (1971) Electromyography of the articulatpry muscles: Current 

instrumentation^ and technique. Haskins Laboratories Status Report on 

Speech Research ^R-25/26 > 73-86. 
Isshiki, N. (1964) -Regulatory mechanism of voice intensity variation* 

J. Speex:h Hearing Res. 7, 17-29. 
Keyley-Port, ,D. (1973) .'Computer processing of EMG signals at Haskins Labcra- ^ 
' nior ies . Haskins Laboratories Status Report on Speech Research SR-33 , 

173-183. 

Shipp, T. and R. McGlone. (1971) Laryngeal dynamics associated with voice 

frequency change. J. Speech Hearing Res. 14 , 761-768. 

Sonninen, A. (1956) The role of the external laryngeal., muscles in length 

' adjustment of the vocal cords in singing. Acta Otolaryngol. , Suppl. " "lOO . 

* * 



185 



ERIC 



IS 



The Effect of Delayed Auditory Feedback on Phon.ation: An Electromyographic 
Study* 

• + ++ 
M. F. Dorman, F. J. Freeman, and G. J. Borden 



ABSTRACT 

. Delayed auditory feedback (DAF) arLters the temporal pattern of 
laryngeal and su^ralaryngeal muscle 'Activity . In some ifistances, the 
alterations are manifest simply in terms of prolonged muscle .activ- 
ity,, while in other instances, the normal cohetent pattern of muscle 
contraction ±^ fragmeatfid^ by rapid oscillations in muscle activity. 
The amjJlltude of electromyographic activity is also altered -by DAF • , 
but changes in activity vary considerably between muscles and speak- 
ers. The^patterns of EMC' activity correlated 'With dysfluenci.es -under 
DAF appear substantially different from. those patterns found in stut- 
tering. , , ' • - ' ' . 

• • ' ' ' '* • 

It is We*ll*-known that most normal speakers who. hear their speech delayed by 

about 200 msec become dysfluent (Lee, 1951)*. The dysfluencies, sometimes termed 

"artificial stutter," are manifest 'in increased vocal intensity, prolonged vow- 

^els and syllable repetition (Fairbanks, 1955). Individuals who stutter, how- 

ever, become more fluent when speaking'lundev delayed auditory feedback (DAF) 

(Neelley, 1961). In this paper, which' reports a portion qf a long-range study 

of feedback mechanisms used in the control of speech productipn, we consider 

two questions :'*"'(1) WHat is the effect of DAF on the laryjxge4| and stipralatyn-^ 

geal muscle activity of normal speakers? and <2) How doe^ the disruption of 

electromyographic (EMG) activity 'und^r DAF compare with the disruption of EMG ^ 

activity 'found during stut4:ering? . ^ . ' • ' " 



*A version of this paper was presented at ^the 8th International Congress of \ 
Phonetic Sci'ences, Leeds, Englai;id',^^17-23 August 1975. ' ' - * ' - * ^ 

'*'a1so Herbert H. Lehman CollSge 'of t:he:.Ci^ University" oF^New York, arid the' / 
Graduate School and University' Cerrtrer of the City* ^Jniveirsiby af -New^'York. 

Also Adelphi University, Garden Ci-ty, N.'Tj-*^. '^'"^:><>^. * 



Also City College, City University of New 'York,* and^ the Gra4uat;e School and 
University Center of the City University of Nfew York. . ^ \' ^ 

i Acknowledgment. : We are grateful to Drs. Seiji Niimi and Tatsujir^^Ushijima 
of the University of Tokyo', Japaif . ^ Tliis research wa^. supported in part by 
NIDR grant PE10774. . * ^ ' - ^ 

[WASKINS LABORATORIES: Status Report on Speech Research. SR-45/46/ (1976) T 



* With respect to the first question, the most striking effect of DAP is a 
change in the timing of motor activity • Figure 1 shows EMG activity from the 
genioglossus ,(GG) during three fluent productions of the phrase "the application 
of wet mud," Note that the EMG activity precedes each tongue raising event, and 
that the EMG isignals for the three repititions of a given gesture evidence 
similar patterns of activity • In contrast, Figure 2 shows GG activity during 

the phrase "the application of wet mud," spoken under DAF. The-normal timing 

of motor commands has been disrupted: there are longer "delays between the peaks 
of EMG activity. Moreover, the patterns of EMG activity, for each repetition of 
a given gesture^-are_-xather dissimilar. • ^ 

A comparison of Figure 1 and 2 suggests that the amplitude of the^EMG sig- 
nal changes under DAF* Muscle activity generally decreases, especially when 
the speech is most disrupted, as in the first two repetitions of the_utterdnce. 
It. is of interest that the third production of the utterance was the most fluent 
and the closest in amplitude to t*he utterance under normal auditory feedback. 

The disruption of the normal tempor-al pattern of muscle activity under DAF 
is correlated J^ith two prominent aspects of dysfluency: (1) increased vowel 
dutation and (2) syllable repetition.^ Therefore, we turn now specifically to 
the EMG correlates of these two phenomena. 

Figui;^ 3 shows the EMG correlates of vowel prolongation under UAF. The 
recordings are from the posterior, cricoarytenoid (PGA), vocalis (VOC) , and - 
orbicularis oris (00) muscles dvjring the utterance ".wasp sting.'* Under normal 
feedback, the VOC, act;ing In concert with other vocal fold adductors to pr oduce 
clpsure for /a/, w^s active for approximately 200 msec.: The PGA was active to 
open the folds, for the voiceless /sp/. The PGA activity w^s followed 100 msec 
later by 00 activity for /p/ closure. Under DAF, the /p/ closure and the vowels 
in both "wasp" and "sting" were; prolonged. The VOC activity mirrored the vowel 
prolongation showing — for example, for /a/ — 100 msec more activity. For the 
/p/ closure, the. 00 evidences three peaks of activity over a 200-msec period. In 
contr2rst to the single peak activity over a 100-m3ec period under normal feed- 
back.*' Note that, the EMC» activity under^DAF,^ for the tfO, did not evidence a 
normal, biit simply prolonged, pattern of muscle contraction. Rather, the^ pat- 
tern of .activity was altered, evid^nping rapid oscJ.llat;;Lbns in mu6tle contrac- 

Let me now turn t^an example of syllable repeti^tion" under DAF. As shown , 
in Figure 4, under noj^rmal feedback the superior longitudinal (SL) peaks, for*- 
this sut)je(it, .for tbe'VlAln "baimy^* and the /3/ dn "weather." Under DAF,t{i^(g 
utterance was rendered as^-'^balmy weat-hether." ' The SL did not evidence* 1jwu ^ 
"normal" coherent Speaks for each repetition of /3/,^but rather the muscle actiy7 
ity was characterized bV rapid oscillations. - * ^ . *\ 

We turn n</w to*the question of the relationship between_the EMG correlates 
of djrsflueucy under DAF^and the, EMG correlates oj^- dysfluency* dur4.ng ^s tut tering* 
Freeman and |jer cblleagiies Xe.g., Freeman et al., 1975) have found generally in- 
creased EMG activity, 'especially for the larimgeal muscles,, during stuttering. 
More , important, 'perhaps, is that the-normal reciprocity of "laryngeal .abductor 
and Adductors was found to be disrupted. ' * * - - * ' - 



NORMAL . 



GGl 




500r 



GG2 



4 












1 ■h^i 





500r T 



GG3 




lOOmsec 




9dae pla k* ei Jn^v wet mAd a 



Figure 1:.. Muscle, activity reco.rded' from the gehdoglo^sis . (GQ) 'during^ three . 
' " productions of the utterance ^'tlie application of wet mud" under 
nprmar auditory f eedback. " 



\ 



189 



i8S 



DAF 



■ GG 4 



500 

GG 5 L 




LJR 




ae 



9 k ei — *- f h av .w e t 





500 



3iaep |9 k 9i / TT*" av*" a .w e ' + 



GG6 





appli c a tion — of-we^t — rti^ u jcj 



100msec 



'Figure 2: Muacjle activity recorded froift hie, geniogldssis (GG) during three 
productions of the utterance ^'application of wet mud.*' under D^F. 



190 



400f 



NORMAL 



DAF 



PCA 




200r 



VOC 




200r 



00 




700 



wa spstiQ 



JC 






900msec 




w 



a s psT I 



Figure 3; Muscle activity recorded from, the pd^ste'rior cricoarytenoid (PCA) J . 
I vocalis (VOC), and orbicularis oti^,(pO) during the production! of 

- ' the, utterance "wasp sting" tender no^pl and* delayed auditory 
feedback. 



191 



. 190 



ERIC 




I L 



o 
o 



u 
o 

^ '2 

w CO 

*H 0) 
O 

C CO 

<U H 

iJ OJ 

U 
CO 

>^ C 

OJ CO 
iJ 

ci H 

•H CO 

CS O 

CO CJ 

^ ^1 

CO «3 

H 

CO : 

CS u 

•H (U 

D 4J 

4J CO 

•H (U 

00 

CS 
O 

H 0 
H 

U CO 

(U (U 

a 

D CJ 

0) CO 

(U (U 

42 4J 

4J 4J 

a 

O (U 



O 





a 
> 

u x: • 

CO 

a 

(U bO CO 
H CJ ^ 
a »H 

U) >^ (U 
5 D QJ. 

S 



192 



C/5 ' 



(U 

El' 



id 



191 



stuttered 



lOOOr 



s s 



s I ilabl 



SL 




I I .1 



Ml II 



300r 



PCA 



400r 



INT 



TA 




Fluent 



li labl 




-650 



-200 0 



I 1 L 



300 



Figure 5: Muscle activity recorded from the tongue (SL), laryngeal adductors 
(INT and TA) , and the laryngeal ajjdjuctor (PCA) during fluent and 
stuttered speech. . " ' • ■ . 



193 



ERIC 



, 192 

V 



at 98S973Sb S al 319ri3 3Bri3 n998 9d HBO 31 .9no3 §nlI'lB3t IbHT 9ri3 loBt 83luS91 

rialw b936jtoo8^B T(3jtvl3i)B bloT{rioni93e. nl 98B9ionl tik bnB ^(^IvlJofe bloi^{ri3o:jlio 









V ^ / 

■ ** / 






1 11 L jl^l 


1 1 





OOf 




398m008 



8"iuo:Jnoo §nlIlB]t sd:} — d8lJl§na io]t bsnlraBXS ^^onB1^:i:Xu sdiJ awode S 9iij§l^ 
39voI jv^sa** bnB "doS asvol vsH*' asonsiJnsa 9d:J nl ebiow hB^^^1:^^ sdiJ no bsiiuooo 
\o ybod "issibI b raoiBt neeodo straw aaqyiJ sbnBisiJiJu iBdT bnB dallg'na diJoS "idoS 
' dl :J8B9l M .s-^^-^^^s^^lb Yll8B9 8;bw eltlBs^ q*^ :J98no sdzJ 98UB09d BiJBb 

• .8"i9;lB9qa iBdT bnB d8ll§na loBt b9§Bi9VB '9tr9W 9qY:J 9onBi9:J:Ju doB9 \o 8na;Io^ 
8ni^3BH 9d:J §nl8u b98aaooiq9idw BzJBb 9xI:J bnB c^598u 9i9W 89bQi:J09l9 97lw--b^;IooH 
,9raod8 bnB ,3{bO c^aoilH) m9:J8Y8 §nl889Do:iq b9sli9iuqrao;) 89lio^iodBJ 

.(e\:ei ^sjio^-y9Iw9)i 

9d:J ]to yzJlvlzJoB 9d:J \o §njbnil:J 9fh^ :Jb b9;IooIy9w ,B:JBb 9d:J §nlsYlBne nl^ ' ^^-^ 
,YllBol]tlD^9q8 .8liB]t 9d:J o:J nol:J6l9i. nl 89lo«uni bloiYdiJo^lio bnB bloYdoni9:}8 
Y:}1v1:Job bloiydiJoolio 9d:J doldw :Jb" ml:* 9d:J b9ijLraB9ni 9W 9iu§l'5 nl nwoda 8B 
-nl o:* nB§9d, yiJlvliJoB bloYdoni9:J8 9d:J doldw iJ^ 9nil:J 9d:J bnB c98B9io9b o:J n§59<f 
.IlB]t o:J nBS9d doldw iIb snil^ 9d:J o:J 9v1:JbI91 diJod ,98B910 



. ! 



• nosnl:l:iA asmfiL bnB noa^olia BnnoQ 



-B^u5 gnizii 10 ri§lri ri:Jlw y^1v1:Job blo:5\^:Joolio 5o nol3B±oo83A 

nl §nMl65 10 woI'/f:J±w y^1v±:J03 qsi^s bnB (0*5) Y^nsupsii iBJnsm 
-lisqxs (OMa) DlriqBi80YnioiJ»09l9 euoismuii y<J bsnnlinoo nssd bbH dossqe 
5o 9I01 9ri:J i9ri:J9riw rj±B:Ji9oaB o:J el' ^^^2 a±ri:J 5o 98oqiuq 9dT .a:Jn9m- 
"OdXio 9ri:J 5o :)Bri:J o:3 auogolBnB al^Q^ 8nl39woI nl 89lo8ufly qBi:J8 9ri:J 
bnB i)loyrioni9:Ja 9ri:J 5o no±:JB8l:Ja9vnl OMH nA '•O'^ §nl8dtB:r nl bloiY^^ 
• IbHT bnB ria±I§na 5o 8i9;lB9q8 ri:J±w b9niio5i9q bbw a9lpauni bloiYrf^oolio 
'Snliub T(:J±T^Jr:JoB q^iiJa 5o aafB^* b99bn± 9i9W s^sd:i :JBd:J bnuo5 8bw :JI 
r^BVB^ToH .q'5 riglri 8n±':rub y^^v±:)ob, bloiYrf^oolio 5o e;lB9q bnB* wol 
-91 ■ 0:3 :JMqa9i*d:JJtW"y:J±T?±:J0B 9loaufli 5o 8n±m±:J 9d:J 5o no±:JBn±flrBX9 
:JBri:J n± a9loauni qBi:Ja 9ri:J koil 319551b b±oiYrf:Joo±io 9d:J pBd:J b9lB9V 
5o :J9ano 9ri:J o:J loliq y^J^vIiJob n± 98B9ion± o:J an±§9cf b±oiYrf:J6o±io 9d:J 
an±89d y^±v±:Job 9loauni qBiiJa lo 98B9ion± 9ri:J 8^9i9riw e9a±i 9d:J 
. . .IIb5 9d:J 5o iJ9aAo 9rf:J i9:J5b 

9loaum lB98nYiBl 9ri:J ax sLokum b±oi?{ri:Joo±io 9ri:J :JBd:J nwonjt-Il9w i9d:JBi pi :Jl 

.rio99qa nl (0*5) Y3n9up9i5 lB:Jn9niBbaul ad:3 ^^iRlB-i io5 9lcflanoqa9i Y-^liBmliq 
9lcflanoqa9i al a^Ioaum 10 9Io8uot LB^sgn^isL doidw o:J bb :Jn9ni99i§B aa9l al 9'f^riT 
• rio99qa riilw 89lbu:J8 (OMH) olriqBi§OYflioi:Jp9l9 lBiav92 .do99qa nl 0*5 3nli9WoI io5 
-oni9:Ja 9d:J Y-^iBluol:JiBq <y^1v1:Job 9lDaunT qBi:J8 30 nol:JBlooaaB n6 b9:Jioq9i 9VBri 
. 9vl:J0B nB al bloYrfoni9:Ja 9ri:J :JBd:J ia9§§ua a9lbu:Ja ^a9d:J 'bnB woI ri:Jlw' ebloYfl 
5o 9aB9iash^al 9^9ri:J :JBrii nwoda 9VBd a9lbu:Ja i^^Q .q3 §nli9woI io5 malnBrio9m 
-ooiiD 9d:J :J^ri:J b9:Ja9§8ua 9VBd bxiB qS wpl d:Jlw b9:JBloo8eB x^lvi:30B bioixd:3ooiio 
• .0*5 5nli9WoI io5 malnBfio9m 9vl8aBq b al blo^x^rf^ 



9ri:J bnB bloYrfoni9:J3 9rf:J 5o 39X01 9d:J Y-^Iui9iB0 9ionT 9nlm&X9 9W i9qBq 3lri:J nl 
ri:Jlw ^uo b9liiBD 919W 3:Jn9mli^qx9 olriqBi§OYnioi:Jo9ia .q'5 8iili9WoI nl bloiYrf:Jo3lio 
9d:J IbHT 10*5 .ri3ll§na nBoli9niA 5o i9^B9q3 9vl:JBn a'^bnB iBrfT 5o ti9;lB9q3 9vl:JBn b 
e39qY:J 9lcrBlIy3 :}n9i9551b ^B1d:i no 39no:J §nlIlB5 9i9w b9nlmBX9 89onBi9:J:Ju 
^rioBS .\uud ellq eJtlcf\ : iJnBnosi^o lBl:jinl bnB I9w0v 04 §nlbioooB b9liBV dolrfw 
lBDlqY:J 3yori3 I 9iu8l'5 .93Bidq ibIiibo 9ldBllY8-9no b Y^b9b909iq 3BW 9lcfBllYa 

" . 1 

* eBol^niA 5o Y^SJtooS Ii^ol:J3uooA 9ri:J §ijl:J99nT riijbe 9ri:J :Jb b9:Joar39iq i9qB^* ^ 

.5yei\i9cfm9voM y-e e«51lBD eO^^^^nBi'^ nB3 ^ 

1 \ '1 ' + 

f y-. ' * .311038 e^J^^i^^nnoO 5o Y^Jt8i9vlnU oalA 

.nnoO eHobnoJ W9M Vi9:Jn90 3m9:J8Y8 i9':JBWi9bnU Ibv^M 
^ /\ . • " . ■ . ^ 

[(dVei) dA\eA-^8, V101B989H rio99q8 no. :Jioq9H 9u:Jb:J8 : 8ai^(>TAH0aAJ 8WI>I8AH] 



■f 



eos 



ERIC 



' 80S . 



NORMAL 




"-200r 



INT 




200r 



PCA 



( 



/ 



J I— J — L. 



400 




w e 



DAF 



LJR 




* i ■ ■ ' ■ ' 1 L 




msec 



Figure 6: Muscle activity recorded from the tongu/ (SL) , a laryngeal .adductor 
(INT), and the laryngeal- abductor CPCa/ during tht production ^of 
"weather" undefr normal and delayed aualtorV feedback. 



19A 



193 



tween the fluent^and stuttered utterances • are ;readily app^p^?|^t. 




• 

For example, Figure 5 show^ EMG recordiiigs from the abductor of the vocal 
folds, the PCA, and the prirpary adductor of rthe vocal fo^ds, the INT. Normally 
when oAe Is active, .the oth^r is inhibited, but during the srtutterihg block, 
for example, on the /s/ of the word "syllable," botli are active simultaneously. 
This loss oif reciprocity disrupts normal phonation. ^Amplitud^dif ferences be- 

Muscle activity, the^y for stutterers is generally kf higher amplitude dur? 
ing stuttered than during fluent speech, and there is evidence that the normal 
reciprocal relationship of 'the abductor and adductor laryngeal muscles is dis-*" 
rupted during stuttering blocks. , 

t. 

The dysf lutancies in the speech of normal speakers under DAF are not like • 
stuttering in these two. respects. First, under DAF there are. amplitude changes 
in the EMG signal, but the direction of cl^nge varies for different subjects 
and different muscles. For example. Figure 3 indicates an increase in the level 
of VOC and 00 activity under DAF. In Figure 6, however, the.SL shows a decrease, 
the INT shows only minimal changes, and the PCA shows an increase. 

The second difference in EMG activity between normal speakers undfer DAF 
and stutterers is that during a ^stuttering block the disruption df reciprocity 
between abductor-adductor muscles o**f the larynx prevents or delays normal^ 
initiation af voicing while for nbrmally fluent individuals spe^aki^ig under DAF, 
voicing usually starts but is either prolonged or "restarted." Typically, in 
dysf luetvcies caused by DAF, bre^'akdown of recipropity occurs after the initiation 
of voicing. To illustrate, for the fluent production of "weather" shown in 
Figure 6, th^ ad4uctor {V^) is active through the utteraAce because all the- 
segments ar^ v,oiced. The abductoi^-i^A) is suppressf'ed J:hroughout the aoterance. 
However, uhder DAF the abductor firesrf^ring the period in which the INT is 
still strongly active. - . 

' ' ' ' . • . \ ' . ' . 

To summarize, th& main effect of DAFJLs t-o altAr the temppral pattern^f 

laryngeal and supralaryngeal.jnusc±^'''activfty. In some instances the alteratJLond 

are .manifest simply, in terms of prolonged musole activity , while in other in^ 

stances the^ normal coherent pattern of muscle contraction is fragmented by 

^rapid oscillations in muscle activity. The amplitude of EMG activity is also 

altered by DAF but ch_anges in -^activity vary considerably between muscles and 

speakers. Finally, the patterns* of EMG activity correlated with dysflu^cies 

under DAF appear, substantially different from those patterns^ound^Th stutterliYgT 

I ' REFERENCES 

\ ^ , ' ' 

Fairbanks, G. (1955)^ Selective vocal effects of delayed auditory feedback. 
J.. Speech Hearing Dis. 20, 333-346; 

Fteeman, F. J., T. Ushijima, ^M'. F. Dorman,v and Q'. J. Borden. ^ (1975) * .Dysfluency 
and photjatlon: An electromyographic investigation of larynge^al activity 
accompanying the moment of stuttering. Paper presented at the 8th Interna- 
tional Congress of Phonetic Sciences, Leeds, England, 17-23 August. 

Lee^ B. S. (1951) Artificial stutter. J, Speech Hearing Dis. 16, 53-55. 

Neellt?y, J. N. (1961) A study of the speech behavior of stutterers and nonstut- 
terers imder normal and delayed auditory feedback. J. Speech Hearing Dis. 
Monograph ,^ Suppl. 7, 63-B2. ^ 



i 
/ 



191 



195 

/: 
> • 
■ // 



/ 1/ 



Some Aspects of Coarticulation* 

+ ++ 
Fredericka Bell-Berti, and Katherine S. Harris 



ABSTRAfcT 

The analysis o^ the acoustic and* electromyographic experiments 
reported here indicates that while there is little vowel-to-vowel 
anticipatory or ca^i^yover coarticulation, carryover coarticulation 
is both more common and i6ore extensive than anticipatory coarticula- 
tion, ^ I i ) 

INTRODUCTION * . - " 

I 

The nature and efxtent of coarticulation are of central interest to ^theories 
of speech productioiy. Previous work on this problem, for several languages, has 
shown that anticipatory (or right-to-left) effects extend up to three segments^ 
while carryover (oy left-to-right) effects extend up to two segments. In addi- 
ti,on, there is evidence that anticipatory effects may be different in cause from 
carryover effects /(for a summary of these data*, see Dahiloff and Hammarberg, 
1973). 

More specifically, Jlozhevnikov and ChiStovich (1965) and Danlloff and Moll 
(19^68) have found anticipatory effects to extend over as many as three ,phoneme 
segments and across syllable boundaries. These effects have been explained as 
the reorganization; of motor patterns ,for speech segments. Carryover effects, on 
the o.ther handJ have often been attributed to mechanical inertia or artipulator 
"sluggishness''/ (Lind^om, 1963; Stevens and House, l563jH^ke, 1966.; ' Stevens , 
House, and Paul, 1966 ^HlaiJIeilage, 1970), although th^ effects are now some- 
times considered to be deliberate reorganization of speech ^segments in the same 
way anticipation is a deliberate reorganization (MacNeilage. and deClerk,- 1969; 
Sussman, MacN^ilage, and Hanson, 1973; UshifiiUa And Hirose, 1974). 

Despite the central position of coarticiflation rules in a general theory of 
speech production, there are very few descriptive data on tVie relative magnitude 
of anticipatory and carryover coarticulation effe^^cts at any level. The two ex- 
^ periments prejsented in this paper provjLde some of those data. They are extreme- 
ly similar in the form of the utterances examined. For technical reasons, there 

— — — ■• ■ . ' ■ ' ' 

*A version. of this papet" was presented at the Sth International Congrea^of 
Phonetic Scie^es, Leeds, England, 17-23^August 1^75. * , ^ 

'^'aIso Montclair State College, Upper Montclair, J. , 

**^Also The Graduate School and University Center, City University of New York. 

JHASKINS LABORATORIES: Status Report on Speech Research SR-45/46 (1976)] 

197 



195 



ERIC 



/ 

/ 




ar^ small differences in the format used in the two experiments. However, \as 
will becotiie apparent, the results for a general theory of coarticulation poii;^ 
in the same direction. , . : ' ' 

TftE ACOUSTIC EXPERIMENT 

In the acoustic experiment, the utterance set contained 18 three-syllable 
nonsense words, consisting of a stressed con^sonant-vowel-^consonant (CVC) preced-*, 
ed by [pa] and f ollowed J)y' [ep] . The vowel in the stressed syllable was either 
/i/, /q/, or /u/, and the consonants were /-p/, /t/, or /k/. All combination^ of 
consonants and vowels were used,"* except the symmetric ones; for example: 
/papikap/, V^patupap/, and /pakatep/. The utterances ^were spoken within a 
carrier phrasfe, "Say now," at a conversational rate of speech. 

Acoustic recordings were obtained^ frpm one speaker of American English, of 
18 "repetitions of each of the 18 utterance types. : 



The audio* signal was sampled through the Haskins Laboratories pulse-code- 
.modulation (PCM) and Spectrum -Analyzing Systems, the former for editing, the 
latter for gejjerating spectrum data. *Af tej^-a^tware filtering (and threshold- 
iiig) > hard copies of computer-generated ;gpectro^ams were obtained and formant 
measurements, made off-line. 

*. Since seapnd- formant (F2) positioA is extremely\ensitive to back-to-front 
tongue position and lip-rounding — that /is, front cavity length— measurements 
were nf^de at seven points- in each repetition of each utterance type^ , Averages 
of 15-18 measurements for each sample point were obtained. Schematic, spectro- 
grams of F2 *w^^^ generated fi*om thgge/averages. 

The measurement po^ints were ' . ' ^ ' 

1-. One point in O]^ (this s^^llable* was go weakly articulated that no • 
further measures ccmld be made for all utterances);, , ♦ 

2.* ;rhe beginning,, middle; and, end points of the stressed voWel; 

^^.> The fi^ihhing, middle, and end points of . ' 

No attempt was. made to account for durational variation, since the sample 
time represented by each data point in the spectrogram is 12.8 msec; hence, the 
time scale is too crude -for detailed measurements. ' , 

^RESULTS OF THE 'ACOUSTIC ^EXPERIMENT 
•■ • \ 4. ' ^ ^ ^ . 

. The results of this experiment are supnarized in Figures 1 arid i2. 
Figure 1 shows the IB utterUncec plotted with the, first consonant held constant; 
Figure 2 jshows the same d<ta with the second consonant held constant, irf 
Figure 1 the left-hand panel represents the averaged F2 values for utterances 
whose stressed vowel is prececied by /p/, t\\e middle panel represents the aver- 
aged F2 values for utterances^ whose stressed vowel is preceded by /t/, and the 
right-hand panel represents t)\e averaged F2 values for utterances whose stressed 
vowel is preceded byv /k/. Within each panel, the first* schwa 'is represented by 
t;^e single points, at the left; the. stressed vowel in the^ middle, identified as ^ 
V' on the abscissa^; the second schwa on the right. Second- formant points for 

198 ' ' ■ " ' ' . ' . ■ ^ " 



ERIC 



196 




/ 




#5 k 



/ 




/I 




///// 



X 




V/ 






0)' CU 




fl) 


CO 








H 




0) 


(U 






M 




iJ 




(0 


• vH 






H 


CO' |} 





d 




4J 




io 


O 






u 


> 


6 oo 




o 
















CO 


OJ 

CO. 


0 

CO CO 




42 


en 






O 


0) 








u 


^•s 






u 

CO 


CslrH 












o 


0) 

u 


OJ CO 








M 














c: 






H 


o 

•H 


C CO 

O CO 




OJ 


4J 




• 




O 


U u 




u 


OJ 


o u 






CO 


OJ CO 


*^ 


xl 




CO 








43 




O 


43 CO 


U 


V) 




o o 


•H 


u 


OJ 






in 




QJ ? 




o 




^ CO 




cu 




O OJ 

o 


o 

•H 


(TJ 


> 


u c 


U 


u 






O 








OJ 




> 

o 


•H OJ 


CO 


OJ 






u 


iH 






43 


&0 




43 


U) 




CO 




•H 




u 


O 


M 


0) 


o 


o 






iH 




OJ 






CO 


43 






CO OJ 


U 


H 


u 


U &0 






•H 


O 

iH M 






O 




u 




> 






1 








OJ 




43 




OJ 


•H CMiJ 


o 


M 


O 


•H 






(X 








1 CO 








0) 




^ 




(U 






M C 


§ 


CO 




43 O 




4J 




U *H 


4:^ 






U 


O 


OJ 




0) O 


OJ 






43 OJ 


CO 


OJ 




U CO 




M 


c 






:3 


0) 






CO 


B 


•03 ^ 






0) 




me 


u 


43 
CO 1 




CO 
















OJ 




43 




6 


0 iH 


U 



0) 

u 

00 
•H 



Figure 1 



199 



E RIC • 



197 



a a 




CK < 

O ■ 1- 

ix. CO 

o § 

LiJ O 

CO 



jif ^ a a 



> a 




4. 



(0 cd 



ci O 




60 



(U (0 
(0 



u 

60 



200 



ERIC 

hfiiinniiinrfTiama 



Figure 2 
198^ 



the vowels are marked with circles and ctfnijectfed with dashed line^ for utter- 
ances whose stressed vowel is /i/,— tiriangles and dotted lines for^,utterances 
whose stressed vowel is /q/, squares and solid lines for' utteg^ces whose 
stressed vowel is /u/» 

We can examine the relative magnitudes of anticipatoiTy and carryovers effects 
by looking at the effects of .the stressed vowel on the iiutial and terminal schwa 
vowels.^ One-jst-ep effects are seen lii'bpth directions:*/ the initial schwa is 
affected by the following consonant, while the second/scRwa Is affected h|y the 
preceding consonant. However, when we tjurn to the i/bwel-to-vowel effects, we 
find that^thd initial schwi is not affected by th^/f ollowing vowlfel: theXFg^ 
averages for are not separated as a function-^ the. following, stressedV''^ 
vowel. Howe*ver, the same stressed vowel does ^ange the value of the following 
schwa. 

In Figure 2 the left-hand panel repre^nC^the averaged F2 values for 
utterancds. who^e stressed vowel is folloVed by /p/ ; the middl^. panel repres^ts 
thq averaged F2 values for utterances wKose stressed vowel^j/ follqiwed by /tA; 
ami Lhe right-hand panel represents thfe averaged F2 values for utterances whose 
stressed vowel; is, followed by /kA. yAgain, withii;i each panel, ^the first schwa 
is represented by the single point/^t the left; the stressed vowel in the middle, 
identified as V on the abscissa/ the second schwa ^on the right. Second-f ormant 
points for the vowels are conne/^ed with dashed lines for utterances whose 
stressed Vowel is /i/, dotted >l4.nes for utterances whose stressed^ voWel is 7q/, 
and.-solid lines for utteranc^ >whose stressed vowel is /u/. 

' Looking at the secon^f schwa, we find that the second formant Xs higher, 
throughout^ its duration/ when it follows /i/ than when it follows /u/ and /q/, 
tegardles4 of the place/of articulation of the intervening consonant. 

/ t « , 

la/general, th;^, 'at JiTie^ acoustic level; carryover effects are larger than 
anticapjatory effej?xs. It is th'is asymmetry 'of , effect that lAus.t be ^accounted for 
at the (articulatjory level. ^ 

THE EljECTROM YOGRAPHIC (EMG) EXPERIMENT 

^ ' ' 

Ohe. articulatory level we have chosen to examine for manifestations of 
vow(?14toyvpwel interaction is the EMG signal.* We obtained "Recordings from the 
genioiL^sus muscles pf three speakers of Amer'ican English. The genioglossiis 
mus<:lX the major muscle mass of the tongue, acts to bunch" and raise the tongue, 
and Xs *Tnost active for high front vowels. - 



In this experiment there were 24 VCV utterances in which the two vowels 
!/ere all possible combinations of /i, u, a/ and were always different; the 
stress was systematically varied between the first and second voy/els; the medial 
consonant was either /p/ or 7k/.'^ Additionally, all utterances were preceded by 
[op] and. followed by [pa], resulting in utt^r^nces of the type* /epfpupo/ and _ 

The data vrere tabulated by^ inspecting minimal pairs in which either the 
first or the second vowel. was held constant, and assigning the pairs to the cat- 
egories: "no difference," "small difference," and "large difference*^ in EMG 
activity corresponding to the constant vowel targets of each pair (Figure 3). 
y *- ' . 

201 



190 



KSH 



GENIOGLOSSUS 



pv400 



• No 

Difference 




■500 



apipupa 
apfpapa 



500 msec 



4 / 



Small/ 
Difference 



pv400r 




/ 



pv400 



. Large 
Difference 



-500 



-apapipa 
" apupipa 



500msec 





9papu[:j9 
9pfpup 



/500 msec 




Figure 3: Examples of geniogloesUs EMG data .ev^lu5ated as having no difference, a 
* small difference, or a" large differentie in target vowel activity as a 
' function of quality changes in the noktarget vowel. The' top section , 

gives data for anticipatory coarticulation when' the target vowel is in 
J:he first nonneutr'al syllable and tl^e middle and bottom sections^ ior 
^ carryover coarticulation «hen the target vowel is in the secptid aon- 

peutral syllable. / - / 



ERIC 



202 



200*, 



Both magnitude and timing differences were considered in assigning the contrast 
pair to one of the categories. • * , ' ' 

Anticipation was looked for in pairs in which the first vowel was constant; 
carryover was lookeS for in pairs in which the second vowel was constant. The 
number of events in each category was divided by the total 'number of comparisons 
to determine the percentage of case^ in each of the three categories for both 
anticipatory and carryover coarticulatory effects. 



There was 
coarticulatlon 
coarticulation 
ence in EMG ac 
the other hand 
the carryover 
carryover coar 
/ carryover coar 



no difference in EMG activity for 75 percent of the anticipatory 
pairs and a small 'difference iij 25 ^jercent ^qf _^the anticipator^ ' 
pairs (Figure 4) . Thete were no cases in which a large dif fer- 
tivity was observed in the anticipatory coarticulatlon pairs. On 

there, was no difference in EMG activity for only 25 percent of 
coarticulatlon pairs, a small difference, in 45 percent Of the 
ticulation pairs, and a large difference in 30 percent of the 
ticulation pairs. 



VOWEL-TO-VbWEL COARTICULATION 



)00n 



• '\ 

\ 





I I ANTICIPAT;pRY 
CARRYOVf&R 





•1 



None Small 



EMG Differences 



Large 



Figure 4: Histojgxam of ptoportioh of EMG activity magnitude differences ior 
^ atitlcipatory and carryover coarticulation. « \ 



In other words, there was no vowel-to-vowel anticipatory coarticulatlon in 
75^ percent of the anticipatory pairs and there were no large differences in the 
anticipatory pairs, while there were large differences in 30 percent of-^he. 
carryover pairs. The results differ ^somewhat from the EMG data reported by Gay 
,(1975), which may be accounJ;ed--f or .by differences in syllable makeup and the 
rate oif speech. ^ * 

203 



ERIC 



201 



CONCLUSION , • ^ , 

Our acoustic and EMG results ar^ in agreement with Gay^s (1974) cinefluoro- 
• /graphi(? examination of a very similar corpus, which showed either no vowel-to- 
vqw^coartiqulation in either direction, or some carryover coarticulation, 
THfesrf data all support the view that carryover coarticulation is both more common 
and more extensive than anticipatory^ coarticulation and is also a reorganizatipn ' 
of the motor command, ^ - 

• * * 

REFERENCES . ' ' 

* ' * , ». 

« • a 

Qanilofif, R, and R. Hammarberg. (1973) On defining coartiotflation, J, 

Phonetics _2, 239-248, ^ * ^ * 

Daniloff, R. and K. Moll, "(1968)' Coarticulation of lip rounding, J. Speech 
^ Hearing Res. 11, 707-721, 

Gay, (1974) A clhef luo.rographic study of 'vowel pfo'duction, J. Phonetics 2, 
255-266. ' ^ \ 

Gay, T. (1975) Some electronjyographic measures of coarticulation in VCV utter- 
ances^ Raskins Laboratories Status Report on Speech Research SR-44 , 137- 

Henke, W. (1966) Dynamic articulatory model of speech production using com- 
» puter simulation. UnpuhlisheA Ph. D. dissertation, Massachusetts Institute 
of Technology. / ' ' * 

Kozhevnikov, W. A., and L. A. ChistiWich. (1965) (in translation) Speech, 
Articulation, and Perception . (Washington, D.C.: Joint PuSlricatlons 
^ Research Service, -U. S. Department* bf e&mmerce> No. 30). 
tindblom, B. E.' F. (1963) Spectrographic study of vowel reduction. ^ J. Acoirst. 

• Soc. Am. 35. 1773-1781. ' ^ 

MacNeilage, P. F. (1970) The motor control 9f serial ordering^ of speech.- 

Psychol. Rev. 77 r 182-196. 
MacNeilage, P. F. and. J. L. deClerk. (1969)' dn the motor control of coarticu- - 

lation in CVC monosyllables. J. Acoust. Soc- Am. 45 , 1217-1233* 
Stevens, K. N; and A. S. House. (196:?). Perturbation of vowfel articulation by 
consdnantal .context: An acoustical 'fftudy.. J. Speech Hearing Rest 6, 111-. 
128. * ' . . . ^ 

Stevgns{, K. N. , A. S. House^ and ^. P.^ Paul. (1966) Acoustical description of 
syllabic nuclei: An Interpretation in terms of a dynamic model of articu- 
lation. J. Acoust. Soc. Am. 40 , 123-132. 
Sussman, H. M. , P. F. MacNeilage, and R. J. Hanson. '(1973) Labial and mandlbu-^ 
lar djmamics dliring the production of bilabial stop consonants. J. Speech 
Hearing Res. 16. 39^-420. 
Ushijima, T. and»H. Hiro6e. (1974) Electromyographic study, of the velum during 
speech. J. Phonet,ics 2, 315-326. - 

. \ ■ . 



204 



* 1*. ^ 



ERIC 

hnimiimrn'iama 



• 



The Function of Strap Muscles in Speech* 
Donna Erick^on and James E. Atkinson 



ABSTRACT * 

Association of cricothyroid activity with high or rising funda- 
mental frequency (Fq) and srrap activify with 'low or falling Fg in 
speech has been confirmed by tiumerous electromyographic (EMG) experiy • 
ments. ^ The purpose of this study "is to ascertain whether the role of 
, the strap muscles in lowering Fq^Is analogous to that of the crl^co-v — 
thyroid in raiding Fq*^ An EMG investigation of the sternohyoid ano^^^ 
cricothyroid musjc;les was performed with speakers of English and Thai. ^ 
It was found that ther'e were indeed peaks of str^p activity during' 
low Fg'and peaks of crico4:hyroid activity during high Fq. HoVevec^ 
examination of the timing of muscle activity with respect to Fq re- 
vealed that the cricothyroid differs from the strap muscles in that 
the cricothyroid begins to increase in activity prior to the onset of 
^the Fq rise, whereas the increase of strap muscle activity begins 
- /'after the ojiset of the Fq fall. , 

It is rather well-known that the cricothyroid muscle is the laryngeal muscle 
primarily responsible for\aising the fundamental frequency (Fq) in speech. 
There is less agreement as to which larynge^al muscle or muscles is responsible - 
for lowering Fq In speech. Se\^ral electromyographic (EMG) studies i/ith speech ' 
have reported an a^^ciation of strap muscle activity, particularly the sterno- 
hyoid, "with low P^, and thes/ studies suggest that the sternohyoid is an active 
mechanism for lowering Eq. Qthr^r studies have shown that there is 3^ decrease of 
cricothyroid activity associatedj^with low Fq and have suggested th^t the crico- 
thyroid is a passive mechanism for lowering Fq. i 

In this paper we ex-amine more carefully the roles of the sternohyoid and the^ 
cricothyroid in lowering Fq. Electromyographic experiments were carried out with 
a native speaker of Thai and\ native speaker of American English. For Thai the 
utterances examined were the falling tones on three different syllable types, 
which varied according to vowel and initial cp^isonant: /bli, pii, buu/. Each-- 
syllable was preceded/lSy a one-syllable carrier phrase. Figure 1 shoys typical 



*P^p^r presfeoted at the 90th meeting of the Acoustical Society of Anei4ca, ^ 
San Francisco, Calif., 3-7 November /i975. 



^klso University of Coniiefcticut, Storrs. , ■ „/ 1 - , \ 

-H- - . . / 

. .Naval Underwater Systems Center,/ New London, Conn. 

/ ■ ^ 

[RASKINS LABORATORIES: Statu? Report on Speech Research ^R-45/46 (1976)] ^ 

■ ' ^ . ' 205 

' " ' . , 203 ' 



ERIC 



results for the Thai falling tone. It can be seen that there is a decrease in 
crieolihyroid activity and an increase in sternohyoid activity associated with 
the falling Pq. . ' * - . ^ - ^ 




9 / 



i__J 1 I I I L— L 



'I 




-400 



SOOmsec 



Figure 1 



v/ 



^ : ^ 

Figure 2 shows the utterances examined for English — the falling contours 
occurred on the stressed words in the sentences "Bev loves' Bob'* and "B^v loves 
-'Bob;" Both English and Thai utterande types were chosen from a larger body of 
^ 'data because the onset "Sf^the Fq f^lis was easily discernible. At least 16 
-tokens of each utterance type were averaged for English and Thai, speakers . 
Hooked-wite electrpdes were used, and the data w6re processed using the Raskins 
Laboratories aompui:erized EMG processing system (Hirose, Gay, and Shome, 1971; 
Kewley-Portj> 1973). , , - 



^In "analyzing the data, we looked at 0ie timing of the activity of the 
sternohyoid aJld cricothyroid mu&cles in relation to the Fq* falls. Spetiif ically, 
as shown in Figure 3, we measured the time at which the cricothyroid activity 
began. to decrease, and the time at which the sternohyoid activity began to in- 
crease, both relative to the time at whi^h' ,the Fq began to fall. " ^ , 



140 

100 
80 



A 



^1 \^ 

BEV loves Bob, 



I I I 



0 20Ql 400 600- 805 TIME(piSQc) 



t40 r 



N t20 
100 

LL 

' . 80 




Bev LQV 

I L I ' I 



1^ 



Bob. 

_i i_ 



0 200 400 600 80Q tlME(tnsec) 



Figure 2 



2.05/ 



Fo Fall 



Hz 




4fr 




9 ' 



Schematic presentation of Cricothyroid and Sternohyoid 
.qctivity in relation to fo fall. 

Figure 3 . 



The results for all tokens are shown in Figure 4. /he zero reference point 
indicates the time at whic^h the Fq begins to fall. It is very important to 
notice that for both the English and Thai speaker the cricothyroid activity be- 
gins to decrease 'pi^ior to the Fq fall, whereas the sternohyoid dpes not begin 
to increase until after the Fq fall has begun.- ' ^ . . 

Returning now to ou'r'ljasic question of whether either, neither, or both of 
these muscles 'can be responsible for the Fq fall^, it is clear !from the above ^ 
that the cricothyroid begins to decrease ^bef ore the Fq fall. It appears, there- 
fore, that the cricothyroid can initiate the Fq fall by passive relaxation. The 
sternohyoid/ oa the other hand, does not begin to* increase in activity until 
after the fall in Fq. Thus it seems that the sternohyoid does not initiate ^he 
Fq "falls that we have investigated, although it is clear that the sternohyoid is 
involved in some Way with low Fg.- • * 

We feel that we' must be careful in interpreting these results not to over- 
geheralize by implying that the sternohyoid can never initiate falls in Fq. 
The data in this study are extremely restricted: limited to sharp falls in 
English in utterance nonfinal position and falling tones in Thai. In both cases 

208 • ' ' 



206 



/ 



Fo Fall Reference Point 



(mean -62) 
I 

I I4f 



i 



12 

10-- 
8-- 

i:: 



(meon 77) 



ji 



CT 



THAI SPEAKER 



HI 



-160-120-80 -40 (J 40 80 120 160 



14-1 



12- 



10- 



(mean -73) 



* 8- 




6+ 
4 



(mean 55) 

t ■ 

r 



II 



ENGLISH SPEAKER 



-160-120 -80 -40. 0 40 80120 160 

,msec lead— l-^ msec lagr 



Figure ^ 



207 



the Fq falls were from a high to low value. We are expanding the data base to 
look at what happens when the Fq falls from a mid 'to low value. In fact, vecsnt- 
examination of the firal fall in the mid tone utterances by the Thai speakef in 
this paper suggests that there may indeed be instsftices in which the sternohyoid 
begins" to peak prior to the fall in Fg. This has led us to speculate about* a 
modal shift' theory 6^f Fq lowering. That is, the Thai data suggest that the 
speaking range can be divided into high, mid, and low voice range, and that an Fq 
drop from the high to tnid range niight be accomplisljgd^^ relaxing the cricothy- 
roid, whereas a drop from mid to low range involves an increase in sternohyoid 
activity. This notion will be elaborated in futur^ work. 

The mechanism of sternohyoid action in lowering Fq is not clear. We are 
still investigating this, as well as other related questions about the' strap 
muscles in speech: Specifically, how do pitch falls ^.nteract with jaw opening;^ 
how does Fq interact with vowel and consonant effects; and how do other strap 
muscles (such as steirtiothyroid and thyrohyoid) interact with the sternohyoid and 
each other in these speech activities? 

REFERENCES 

Hirose, H. , T. Gay, and M. Shome. (1971) Electrode insertion techniques for 
laryngeal electromyography. • J. .Acoust. Soc. Am. 50 , 1449-1450. 

Kewley-Port, 'D. (1973) Computer processing of EMG signals at Raskins Labora- 
tories. Haskins Laboratories 'Status Report on Speech Research SR-33 , 
173-183. • , » 




\ 



\ 



.208 



Laryngeal Muscle Activity in Stuttei^ng* 
Frances J. Freeman^ and Tatsujiiio Ushi^ima^ 



ABSTRACt ^ 

Laryngeal muscle activity during fluent and stuttered utterances 
was investigated using multichannel Electromyography. Analysis re- ^ 
vealed that stuttering was accompanied by high levels of laryngeal . 
muscle activity and disruption of the normal reciprocity between 
abductor and adductor* forces. The results demonstrate the existence 
of a laryngeal component in stuttering and show a strong correlation 
between abnormal laryngeal muscle activity and perceived moments of , 
stuttering. 



INTRC)DUCTI0N 



For almost a century and a half writers have proposed models Of the s.tut- 
tering block that Incorporate an iinportant, perhaps critical, laryngeal compo-- 
nent (Amott, 1828; MUller, 1833; Hunt, 1861;' Kenyon, 1943; Moravek and Langova, 
1967; Wyke, 1971; and Schwartz, 1974). Recently, an increasing number of 
studies have indirectly implicatied the phonatory mechanism in stuttering 
(Stromstra, 1965; Wingate, 1969, 1970; Adams and Reis, 1971, 1974; Agnello, 
1971; Brenner, Perkins, and Soderberg, 1972). ^ • ^ 



*Two versions of this paper were presented in 1975: "Incoordination and Tension 
in Stuttering: Further Results of Multichannel Electromyographic Experiments," 
by F. J. Freeman, G. J. Borden, M. Dorman, S. Niimi, and T. Ushijima, presented 
at the 50th annual convention of the American Speech and Hearing Association, 
Washington,- D.C., 21-24 November; and "Dysfluency and Phonalion: An Electro- ■ 
myographic Investigation of Laryngeal Activity Accompanying the Moment of 
Stuttering," by^F. Freeman, T. Ushijima, M.\F. Dorman, -and G\ J. Borden,' 
presented at the 8th International Congress of 'Phonetic Sciences, Leeds, 
England, 17t23 August. This article is to appear in The journa l of Speech and 
Hearing Research . , . . 

■^Also City University of New* York and Adelphi University, Garden City, N. Y. 

* ^ # 

"^Also Unfvfersity of Tokyo, Japan , ' - 

Acknowledgment ; The authors .gratefully acknowledge .the invaluable contribu- 
tions made by Katherihe S. Harris and Hajime Hirose. They also wish to thank 
Norma Rees, Oliver -Bloodstein, Irving Hochberg, Gerald McCall, Michael Dorman, 
Fredericka Bell-Berti, and Diane Kewley-Port for their counsel and assistance, 

(HASi^INS LABORATORIES: Status Report on Speech Research SR-45/46 (1976)1 

'* ^ . , ?11 

209' ■ ' ^ 



Direct ^G^idence of laryngeal involvement in stuttering has emerged from 
five physiological studies. Chevrie-Muller (1963) used the glottal $raph to 
study 27 stutterers and reported abnormal laryngeal activity that' included 
arhythmie vocal-fold vibrations and unpredictable glottal openirigls. ,FujJ.ta ^ 
(1966) took posterior-anterior laryngeal X rays of a stutterer' and found abnormal 
activity that included irregular and inconsistent opening and closing Of the 
pharyngo-laryngeal ^vity and asymmetric tight closure of the glottis,. 
Ushijima, Kamiyama,^irose, and Niimi (1965); Conture, Brewer, and MclCall. 
(1974); and Freeman, Dorman, Ushijima, and Niimi (1975) \x^e^p^ fibefoptic endo- 
scope to view the larynx during stuttering and repor;ired abnormal activity similar 
to that described by Chevrie-Muller and Fujita.^^XlontureJfet al. (19741 reported 
that the abnormal laryngeal activity they cySserved was sugg^tive of 4istur- 
bance ^in the smooth, reciprocal interplay between" agonist-^^d antagonist laryn- 
geal muscles. . / * . * • ^ * i * 

i 

The present research used multichannel electromyography (EMG) to jinvesti- 
gate physiol^ogical events that occur in conjunction with moments of stVittering. 
Its primary aim was to describe the laryngeal muscle activity that accompanies 
stuttering. ; 

- METHOD • i ' 

i \ 

The EMG techniques used have been developed in ar series of experin^ents in- 
vestigating normal laryngeal msucle activity in phonation and speech (f^aborg- 
Anderson, 1957; Hirano and Ohala, 1969; Hirano, Ohala? and Vennard, 197(jl; Hirose, 
1971; Shipp and McGlone, 1971; Gay, Stroma, Hirose, and Sawashima, 1972; .Hirose 
and Ga'y\..1972, 1973). The experimental procedures were d'ftscribed* by Hii^ose, 
(1971) while data collection :and processing were discussed by Port (1971?) and 
Kewley-Port (1973, 1974). ' \ ' ^ . ' 

Subjects • . , . ' , 0 r . 

The subjects for the experiments were £our adult nale^:,,* D*M. , P.N.,^G.G.; 
and C.D. They were selected both because of their i^illingness to undergo \ the 
procedures required for the experiments And because tl>ey were anatpmicatiy suit- 
able for laryngeal electromyography. The subjects used .were the first fpu.T; 
Suitable individuals located. Subjects t3.G* and CD. were qonsidered mild to 
moderate stutterers, while D.M. ^and P.N. were considered ^evere.^ They ranged irt , 
age from 22 to 47. All had begun to stutter in childhood and each had received • 
some form of therapy. . " ' . ' . ^ ' . 



Procedure 



4tn each case the objective was to secure simultaneous recordings .frotn the ^ 
five intrinsic laryngeal muscles (cricothyroid, CT; posterior cricoarytenoid, 
PGA; interaryte'tioid^ INT; thyroarytenoid, TA; and lateral cricoarytenoid, LCA) 
and at least three of the upper tract articulator muscles (inferior longitudinal, 
IL; superior longitudinal, SL; genioglossus, GG; and orbicularis oris, 00). Re- 
cordings from an extrinsic laryngeal strap muscle, |;he sternohyoid (SH) , wer^ 
taken for subject G.G.' \ ^ - " 

With one exception (00 for subject G. G.) , .hooked-wire electrodes (Basmajian 
and Stecko, 1962) were used. Detailed descriptions of .eacl\ insertion are given 

212 

' . • ■ 210 . 



ERIC 



in Hirose (1971) and Freeman (1975). After each insertion, the electrode-bear- 
ing needle was withdrawn leaving the electrodes hooked into the target muscle. 

The correct placement of aii electrode in a^ specified muscle was verified in 
a two-st6p procedure* First, after each insertion, oscilloscope and ampiifier- 
speakej: systems were used for monitoring muscle activity during performance of. 
a series of specified gestures and maneuvers. If the patterns of activity from 
the insertion site differed fjrom the patterns known to be typical for the target 
muscle, the electrodes were removed and a new insertion was madfe for that muscle. 
Second, recordings were made as the subject performed the critical maneuvers. 
Using the recordings, final verification was based- on examination of the simul- 
taneous activity patterns from each insertion site. Table 1 lists the critical * 
test maneuvers used, and presents a profile of the activity patterns against 
which each laryngeal insertion was verified* If an insertion could not be veri- 
fied according to these .criteria, the recordings from that site were excluded 
from the body of data. In cases where spatial proximity makea contamination 
from acjjacent muscles possible, verification was based on demonstrable function- 
al differentiation between the two muscles in question. As indicated by Table 1, 
functional differentiation is possible between any pair of laryngeal muscles 
except the LCA and the TA. For these two muscles the patterns are very similar, 
differing only in degree (level of activity) for some maneuvers. 

In addition to the insertion verification procedures, othe^r possible 
sources of error were considered. Calibration signals of 300 yV, recorded at 
inteVvals during the experiments, were , compared to verify reliability of record- 
ing and playback equipment. The raw EMG tracings were examined visually for 
(1) abrupt changes in the level of recording from any given muscle and (2) t;he 
presence of movement artifacts. Table 2 summarizes the insertions attempted %jrtd ~ 
reports the success rate in achieving verifiable quality recordings from each 
muscle for each subject. , . , " * 

The design of the study required that coniparable fluent and stuttered 
tokens be obtained from each subject." Since stuttering is a behavior known to 
be highly variable, the experimental procedures were ^necessarily fleljBAle. 

For .Subjects P.N., G.G., and D.M. an adequate number of stuttered tokens 
were obtained by having them re^ad a selected prose passage. Fluent samples were ^ 
secured by repeated readings (adaptation) and by use of ^elected fluency- evoking 
conditions including choral reading, rhythm reading, whimpering, readinjg under 
whitS noise masking, and reading under delayed auditory feedback (DAF) (Wingate, 

1969, 1970). 

I . 

Subject CD. did ni>^ have audible blocks while reading the experimental; 
passage. There^ore, he ehgaged in conversation, making frequent .use of feared 
**'dif f icult** words. In the choral reading condition, the experimenter and CD., 
read a list of sentences transcribed from their spontaneous conversation* The 
recordings ^roade under tne other fluency-evoking conditions consisted of spontan- 
eous conversation and repetitions of sentences in which blocks liad previously 
occurred. - * . * ' 

RESULTS 



The patterns of successful insertion (Table 2) and the procedures used in 
eliciting fluent and stuttered speech samples yielded results that , were not 



213 



TABLE 1: Summary of activities used in verification o^ electrode placy 
, i^nt for laryngeal muscle inq^ertions. . »^ 



SCLE 



PCA 

INT 

LCA 

TA 

CT 

SH 



o 

U 

u 
a 

CO 



00 

a 

to tH 

a . o 

S 

O ♦ 43 



CO 



0) 
PQ 



+ 
+ 
+ 



o 

4J 

a 
o 

43 
P4 



+ 
+ 
+ 



o 
o 

42 

a 

>> 

M 



+ 
+ 
+X 
-X 



0) 

a 

CO 

a 
a 

0) 

a 



a 

CO 

a 
a 

0) 

a 

CO 
0) 

Q 



^■H; 4-x - 



Sp^ch activities 
[h al/ [p a] [b a] [? a] 




+ - 

- + 

- + 



,+ + + + 
- + x4-f 
+ ^ ++ 



+ indicates relatively higher levels of activity' 
" indicates relatively lower levels of activity or suppression " 
X indicates a particularly characteristic pattern of activity 
// indicates that the maneuver calls f^ suppression followed by activity 
* indicates that at the upper extren^s of the subject's singing range 

activity may occiir y 
** indicates that activity occuryonly at the upper and lower extremes of 
the subject's singing raAge 



TABLE 


2: "Verified insertions for each subject 'and for 
series of experiments. 


each muscle over the' 


SUBJECT 


Laryngeal muscles . 


Upper 


* 

tract articulators 


TOTALS 


* 




PCA INT LCA TA CX SH 




IL SL GG 00 


Laryn- Upper 
geals tracts 


Combined 

1 . 


•P.N. 
G.G. 

CD. , 


X XX. 
X/ X X ■ " 
, X X X X X 
XX X " 




X X X X 
X X 
^ X X . 

X X 


3 4 

■ 3 . 2-; 

5 2 
3 2 


7 
5 
7 
5 


TOTALS 


2 3 3 4 1 1 




13 2 4 


2A 10 ■ 





*The Raskins Laboratories multichannel EMG recording^ and processing system pro- 
vided for simultaneous processing of recordings from 8 channels. Ip. all cases 
8 insertions were attempted. For this series 'of experiments, 32 insertions 
were attempted, and of these .24 resulted in successful, verifiable recordings. 
Two PCA insertions and one INT insertion were impossible because of the sub- 
jects' anatomy and gag reflexes; one GG, one LCA, and three CT recordings were 
rejected becatise (1) they could not be verified, or (2) they did riot result in 
good quality recordings, or (3) they exhibited evidence of movement .artifacts. 



214 



I 

parallel for all four subjects. For two "Subjects (CD* and D.M.) recordings 
were obtained for the glottal abductor (PCA) and for glottal adductors. It was 
possible with these two subjects to study the coordinatiort of the reciprocal 
activity of the antagonist for^ces in fluent and stuttered utterances. t 

With'subject CD., both PCA and INT recordings were obtained for 49 utter- 
ances of the saiiie consonant-vowel (CV) sequence allowing a correlation study of 
abdyctor-addub^r reciprocity .in fluent and stuttered utterances. 

For>t:he thr^e* subjects (D.J^^ G.G., and P.'nO who stuttered on the oral 
readings of the experimental passage-, it was possible to compare the averaged . 
levels of muscle activity for selected sentence«-4r«- the stuttered and fluent 
readings. For CD. (who did not stutter while reading), the average of the peak 
values for stuttere,d and fluent utterances of the same word were compared. 
These procedures yielded information on two aspects of muscle activity in st'ut-' 
tering: coordination and levels of muscle activity. 

Findings Related to Levels of Muscle Activity ' 

In the tracings of the "raw" (unrectif ied) EMG signal, strength of muscle 
activity is represen;*ted both by the .amplitude and frequency^ of the spikes. 
Figures 1-3 |)resent examples of raw EMG recordings for D.M., P.N., and G.G. The 
16wer graph in each illustration shows the activity recorded from these same 
muscles under one of the f luency-evokittg conditions. The bottom' Ifne in each 
graph is an oscillographic tracing of the output 'of the subject's microphone. A 
phonetic transcription is placed below each g^raph. In each case the subject was 
reading the jsame portion of the experimental passage. Visual inspection of the 
"raw" ,EMG data indicates that the laryngeal muscles Maintained higher levels of 
activity during the first (stuttered) reading than during the evoked (fliient) 
reading. * , ^ » * 

' The differences observed in the "raw" tracings wer6, of course, apparent in 
the processed (rectified) EMG, Figure 4 shows recordings from four muscles for 
subject P.N.. The graphs on the left o^ the illustration traced the ^ourse 6f 
the EM& activity for these muscles during a stuttered utterance or cne. word I 
"causes," which occurred in the* first (stuttered) reading. The fluent utterance 
is from his reading under white noise masking. Th^,"raw" EMG for 'thes^ utter- 
ances is shown in Figure 2. Figure 5 shows th^ activity of a single mCiscle, the 
LCA, for three utterances of the word "effect." Subject D.H,- repeated ^he ^ord 
three times, with progressive adaptation fi\om a severe block .to a mild block, to 
a fluent utterance. The reduction of activity in the.LGA correlated with the 
reduction in degree of dysf luency. , 

, * 'fif"«' , ' 

In order to -quantify these. differences in ievels_-af muscle activity,, 
selected speech sample3 ^edrisist ing in each case of^readings of the first .para-, 
graph of the. Qixperitaerital -passage) were divided ^into segments of 2-se^ duration. 
The average level of activity in microvblts was calculated for each mus9re for 
each 2- sec- segment . The mean values for the 2-sec segments constituting one 
speech sample werie! then* average<} together, yielding a single mean value for each 
muscle for eachi speech sample.*^ ^ . ' • 

.'For each speech sample, utterance content wa3 held constant, but the total 
iength.of t^Ke 'sample (number of 2-$ec 'segments) varied with utterance rate. For 
Bach .subject, the first (stuttered) reading' was compared with each of the * 



CD 
CD 



CO 
DJQI 

CD 

cc 

1—' 

(/) 
UL 



T 



0 



C 
.CD 
Z3 



03 

To 

O 

O 



2^ 



O 

a. 



< 

o 



o 



< 
O 
Q_ 



< < ^/ 



•O ' 
Q 

< 



O 216 

ERJC. 



Figure 1 
214 



t I 

I I 

09 o 
0) 

iH O 

a a 

09 



iH O 

« a 

0) 0) 
00 u 

& 

cd o 
iH a 

•H 

a i-i 
•H a 

00 



0) 

CO 

(d 

X (d 
4J (d 

o o 

.00 

CO a 

OO-H -H 

a o 

•H a <d . 

8 ETxi 

0) (d 

Ki O 



at* 

'd 

•d /-V S 
•H o 'd 

•H ^ H 

a -d *J 

Ki O 



_ 00 

a • 

^ O 

o u 



00 



U 0) 
O 

•H ^ 
HMD 
(d 0) 09 

0 05 Ki 

o o o 

U P.M-1 



0) 
U 

00 



CD 
O) 



3 

OUO 

E 

CD 

cr 

-4— » 

CO 



LU 




Q 

ZD 



< 



4J 

1 a 

I 4) 

(D T-j 

^ »o 

H ;} 

O CO 
CO 

S o 

M-l 
H I 

0) 

^ H 



o 

s 

IS 
o 
u 

5 



0) 



5 



< • 

M-l C3 0) 

1^ CO 

(0 cd 

•H a 
•do 

^4 c « 

O 0) 

0) o 

£2 O M 

a cQ 



M-l ^4 4J 

4J 4J 0) 

a cd p 

U M-l 



9 0*2 
8 wo 

U 0) 

M-l •H 0) 
O O 4J 

0) ;) 

cd (D 



o 

09 



4) 

u 



Figure 2 



217 



ERIC ♦ 



215 



3 

c 

(0 
€> 

(0 




I 



s 3 

C 

•5 
m 
a> 

OC 

' E 
x: 

OC 



ii 



ii 



t 



2 O 



O 

< * 



< 



g 
o. 

< 



i o 

I 0) 

CO T-1 

0) ^ 

O CO 



3' 



cd I 



Cd 'd eg 

o <u 

o c u 

CO 00 



6 C 

CO 

Cd 



0) 
CO 



I 

CO 



0) <u 
(U 

4J Cd Cd 
u 

o o 
u < 

CO * 

oo 

2 

Q) 

>^ O 
U 

Cd CO 

w O 60 

u 'd 
/-N CJ Cd 

•H Cd 

•H Q) C 

CJ Cd 



-d 
U 

o 

CJ 
(U 
M 

£5 



o 

CO 
M 

Cd 



'd 

S Cd 



' -d 

(U 

u 

I (U 
4J 



O 

a 

(U 

u 

>> CO 
M 

Cd CO 

- M - 

a 0) • 

0 4J o 

o CJ * 

CJ O 

0) 
M 

a . 

(X4 ^ 



218 



Figure. 3 



ERIC 



2i.6 



A 



INT 



pv 
100 




P N 



3 2IZ 



i L 



I i 



LCA" 



300 r 




I L. 



J L 



TA 



> I I I I 



i L. 



-J — t- 



GG 



400 r 



J I U_l I L 



J_l I I i 



•650 



450 



I L. 



-200 0 300 msec 



Figure 4: Comparison of muscle activity — interarytenoid (INT), lateral crico- 
arytenoid (LCA), thyroarytenoid (TA) , and genioglossus (GG)~for 
subject P.N./s fluent and stuttered utterances of the word "causes." 



217 



219 



DM. 



IOOmV 




i I I I I I 1 1 1 1— » » 



lOOr- 




Figure 5: Comparison of lateral cricoarytenoid (LCA) muscle activity, for 

strongly stuttered, mildly stuttered, and fluent utterances of the 
word "effect" as spoken by subject D.M. 



220 



218 




readings under the fluency-evoking conditions. Figure 6 illustrates the differ- 
ences derived from this comparison, by conve;:ting th^ microvolt values ^o per- 
centages, using the mean 'level of tjje first reading as a reference. 

« » ' * 

4 

Differences in levels of activity evideAt in these comparisons are diii-ectiy 
related to the two effects of the fluency-evoking conditions on the production 
of the subjects. In each case, the fluency-evoking conditions resulted in (1) a 
decrease in the frequency of dysfluencies (measured as percentage of syllabi^ 
stuttered) and (2) an increase in utterance rate (measured as syllables per 
second). Figure 7 graphically illustrates these findings for the three sub- 
jects. These results, which relate decrease in dysfluencies to increase in 
rate,. Are in .agreement with a number of other studies of evoked fluency (Adams 
and Hutchinson, 1974; Conture, 1974). However, the two types of change in the 
utterance would generate contradictory hypotheses relating to changes in levels ^ 
of muscle activity. That is, taken alone (without concomitant changes in utter- 
ance rate) , a marked decrease in stuttering would be anticipated to accompany a 
^decrease in average level of muscle activity. On the other hand, increases in 
utterance rate will be accompanied by an increase in average level of muscle 
activity for two reasons. First, an increase in syllables per second results in 
an fncreasei in the number of speecl> gestures per. second, and hence an increase 
in/ the average le-^el of muscle activity per 2-sec segment. Second, an increase 
rate results in a higher velocity of articulator movement, which requires a 
igher level of muscle activity (Bigland and Lippqld, 1954; Gay and Hirose, 
'1973; Kuehn, 1973)". Clearly, two opposite and potentially canceling effects 
wer^e operative simultaneously. 

In order to neutralize the effects of the increases in utterance rate,** the 
syllables in each 2-,sec segment were counted, and the avetagfe level of muscle 
activity in each segment was divided by the number of syllables uttered in that 
segment. The resulting means were^ used to calculate an average level per sylla- 
ble for each muscle for each speech sample. Results for this calculation are 
illustrated graphically in Figure 8.* ^ • 

^Figure 9 summarizes the results relating to decreasjas rin a^rtivity. The 
broken line labeled 100 percent indicates the reference level of tK^-first 
(stutt.ered) reading, while the vertically striated bars are the average of 
all the upper tract articulator muscles for all the fluency-evoking conditions. 
The horizontally striated bars are the average of all ^he laryngeal muscles for 
air conditions. j ' ' ^ . ' 

^ I 

The data collected on subject C.D.'s 49 ut.terancee of the word "syllable" 
and "syllables" were used to learn whether the peak levels of miXscle. activity 
were different for fluent and stuttered utterances. In each utterance, the time 
period between the initial muscle activity for the* production of the voiceless 
fricative [s] (indicated by activity in the superior longitudinal for rai'feing 
the tongue tip ^nd activity in the PCA for opening the* glottis) and the point in 
the acoustic tract that indicated the onset of voicing fot the vowel [i] was 
identified. Within this time period, thfe highest peak of "activity was identi-. 
fied for each muscle. The level (in microvolts) for this peak of activity was 
computed for each muscle for each utterance. The experimenter, after listening 
to audio recordings, identified 23 utterances as stuttered and 26 as fluent.. 
The peak values fpr the utterances judged stuttered were averaged for each 
muscle and the results compared with similarly derived averages from the utter- 
ances judged fluent. Results are graphically illustrated in Figure 10, where 

221 

219 



* 



I 



DM 



<fi . r , 11, , ; , 1 


^ , 

Trict Irticilittr Mitilct 




66 

11 

il SI 

l«r|i|t«l Mitclti 
tA 

PCA 










' ■ ' ■ ■ ^ ' ' 






5 



UpM' TfO<l Afl«wlot«f MuKk*- 

00 



^ GG 



• INT 




TA 



LCA a 



■ ■ ' L. 




0 indu:ates lev^) of activity for the First (stuttered^ Reading 



Figure 6: Comparison of average levels of muscle activity per 2-se'c segment^ 
for subjects D.M*, P'.N., and G.G. 



222 



. 220 



o 

>> 
c 

Q) 
P 
XT 
0) 
U 




•H 



Figure 7 



223 



ERIC 



221 



X Decreasi it Activity 



X iRcriase'ii Activity 



mm 



illlMS 



— 1 1 ? r~^i I I r I 

ttttt Tr*cl IrtiCiUttr Miulei 
00 

IL 
$L 

larMK^^I Missies 
fCA 

LCA HI 



PN 

♦too 



UpP«r Tr«CI ArfiCulcrlor MutcUt 

00 
GG 

lotyngeol MuicWt 

INT 



TA 




ijpo 



-1 — I — I I I I I — 1 



upper Tt«ct ArttCul«tOr MulCttt 

00 



lo/yngeot MutCirt 

SH 



0 indicates level of activity for the First istuttered^ Reading 



Figure 8: ^ Comparison of average levels of muscle activity per syllable for 
subjects D.M., P.N. and G.G. [Two reading conditions shown in 
Figure 6 were omitted here. The second (stuttered) reading was . 
omitted for subject P.N. because it was not significantly "different 
from the first (stuttered) reading; and the choral reading condition 
was omitted fot D.M. because the two voices on the audio recording 
prevented an accurate' syllable ^ count. ] 



224 



ERIC 



222 



Average Levels Per Two-second^egment 




D.M* P.N. G.G. Utals 
' 110% iiiiicates levels ftr the First (sttttterei) Keaiiac 



Average Levels Per Syllable 



iiix- 




l.i. RN. 6.6* Utals 

lllXiiiilcftttt Itvtls far tht First (stitttrt4) Itaiiif 



Comparisons of average levels per 2-sec segment and average levels 
per syllable for upper tract articulator muscles and for laryngeal 
muscle's 'for subjects D.M. , and G*G* 

225 



Z23 



Or 



(0 
0) 

K 

U 

0) 



> 

U 
< 



0) 

(/) 
u 



u 

5 



o 

u 

u 
O 



0) 

a 
a 



o 
o 



< 



0) 

</) 
o 

0) 

Q 



I 



CO 



u 

V) 



o 

>- z: 



< 

o 

Q. 



t 



I 



I 



0) 
U 

c 
o 



01 



0) 

> 
o 

4) 

a 

> 
o 

I/) 

u 



p 



o 

^ • 

^ CO 
CO ^ 

M ^ 
O 03 
rH 
rH 

4J CO 

•H r 

> 

O 03 
03 

0) "O) 

rH rH 

O ^ 

CO 03 

3 

^•'^ 

U-A CO 
O = 

CO CO 
rH 

0) M 

> s 

rH 

0) 

03 4J 
0) 

O 

0) 

60 CO 
03 0) 



0) 
O 



CO 



CJ 
0) 

CtJ ^H 
§ vO 



O 

o 



bO 
•H 

ft. 



226 



ERIC 



Figure JjQ . 



' 224 



the average peak value for the stuttered utterances serves as the reference and 

the average peak value fot the fluent utterances is expressed as a percent. 

Differences for four of the' five muscles were found to be significant at the 
,.001 level of confidence. • ^ i 

indings Related to Coordination 

, The study of disruption of coordination in stuttered speech is restricte^l 
to some extent by our imprecise knowledge of many aspects of coordination in 
normal speech. On one point, however, studied of normal laryngeal articulations 
have provided relatively clear and consistent findings. These studies indicate 
that the abductor and adductor forces* in the larynx northally act with reciproc- 
ity. When the glottal abductor (the PCA) j.3 strongly active, the adductors 
(INT, TA, and LCA) are suppressed, and conversely, when the adductors are 
strongly acjtive, the abductor is suppressed. Since redording^ from the abductor 
were secured for two of the four subjects, it was possible to investigate the 
reciprocal activity of the antagonist muscles. 

Figure 11 shows recordings from three muscles for subject D.M. The graph 
on the left-hand side is from a. stuttered utterance of the word, "less," while 
the graph on the rigbt-hand side is from a fluent utterance of the same word. 

The boxes at the top of the graph contain phonetic symbols %xid represent 
the relative length of each segment as measured in oscillographic tracings. The 
lineup, or 0 poinjt, on each graph represents the end of voicing for the vowel. 
In the bottom graph, the peaks of activity for the SL relate to tongue tip 
raising for the [1] and the [s]. During ttte prolongation of the [1] sound, the 
PCA (glottal abductor) and the TA (a glottal adductor) were both active. During 
the fluent utterance, these two muscles showed reciprocal activity. 

Figure 12 shows three utterances of the word "ancient," with progressive 
adaptation from a strong block tp a mild block to a fluent utteran^. During 
the prolongation of the [e] in the strong block, the PCA (glottal abductor) and 
the TA and the LCA (glottal adductors) were all active. During the fluent 
utterance the antagonist muscles acted reciprocally. 

Figure 13 shows recordings from four muscles for subject CD. for con- 
trasting stt^ttered and fluent utterartces of the word "syllable." The lineup 
point for both utterances w^s on the onset of voicing for the first vowel. In 
the top graph, the peaks of activity in the SL were related to tongue tip 
raising. Dyring the stuttered prolongation of the initial voiceless fricative, 
the PCA (glottal abductor) and the INT (glottal adductor) were both active. 
DCiring the fluent utterance the antagonist forces acted repiprocally. 

^'■'^ 

The*ii^.st syllabjl^ of the.j^prd "syllable" has phonetic content suitable for 
a correlation study of PCA- INT activity. During the first^ segmei^t. of the sylla- 
ble, the PCA vas active and the IN^ was suppres^^d for thel^^production of the 
voiceless fricative. The INT was then active whjj.le the PCA was suppressed for 
the production of the vowel. This pattern is shown In the fluent utterance of 
"Figure 13. If the normal activity of, these antagonist muscles were to be corre- 
lated over time, a negative correlation should result. And, indeed, the plot- 
ting of Such a correlation for the fluent utterance in Figure 13 yielded an £ 
of -.83. Conversely.,^ the plott^g of the correlation between the INT and the 
PCA for the^'stuttered utterance\n Figure 13 yielded an r^ of +.80. 



2 





H O 



o 

0) 



o 
c 

M 
0) 

>> *J 
o 

>> C 
^ 0) 

H 

* y-i 
< 

ft, 

O 0) 

0) 4J 

4J p 

>> *J 

)-l CO 

O CO 

O • 

Q 

O U 
•H O 

0) in 

CO P 

0 CO 

1 u 
i o 

>>M-I 

4J 1 

> ^ 
o ^ 

rH 
0) (d 
H C 
O 
CO 

3 3 

a 4J- 



O C = 
O • 

C H CO 
O CO 
CO M 0),^ - 

•H O tH' 

H •H = 
cd M 

O 3 O 
O CO 15 



228 



Figure 11 



M 
3 

00 



4 



h 



1 



I I » 





* 



«> 

E 
o 



O 
O 

I 




d 

H 0) 
rH 

o 

0) CO 
4J 

M 'd 
CO 0) 

o u 

U 0) 

*J 

CO 

<: 

U rH 
v-/ rH S 
'd *^ P 



o 

0) 
4J 
>^ 
U 
CO 

o 
a 

M 

a 

M 

o 

•H 
M 
Q) 
4J 

CO 
O 

04 



'd a 

0) 0) 
M 1-) 

4J o 
U CO 

CO ^ 
CJ 

rH 0) 
00^ 
CJ o 
O Cu 
U CO 

CO CO 

cd 

0 r 

1 g 

<: a 

^^ CO 



0) 



; CO 

^■1 



o 



'd 'd 
o 9 

4J 0) 

&5 

CO 

O U-l 

a o 



. C M CO 

o a 0) 

.to o 

•H rH CJ 

M CO CO 

CO M M* 

a 0) 0) 

0 4J 4J 

O CO 4J 

U rH O 



H 

0) 
M 

■ :j 



ERIC 



FIGURE n 

227 



229 



Stuttered 



MV , 
lOOOr 



s s s 1 1 labl 



SL 



300r 



PCA 



400r 



INT 




100 



TA 




-650 0 • 350 



Fluent 



CD 



slilabl 





-200 0 300 ) 



Figure 13: Comparison of muscle activity — superibr longitudinal (SL), 

posteriar cricoarytenoid (PCA)., ihterarytenoid (INT), and thyro 
arytenoid (TA) — fpr subject C.D/a stuttered and fluent utter- 
ances of the word "syllable/' 



230 • 



228 



{ 



The program (E$MGCORI) (Kewley-Port , 1973) used for these calculations 
plotted and correlated points at 5-msec intervals. Correlations wre plotted 
for the time period between the first activity of the SL and PCA for the [s] and 
the onset of voicing for the vowel [:]• Coefficients of correlation were calcu- 
lated for 49 utterances^ of the words "syllable" and "syllables." As previously 
discussed, the experimenter had judged 23 of theje utterances to be stuttered 
and 26 to be fluent. 

Of the 23 utterances judged stuttered, 20 yielded positive correlation^ and 
3 yielded negative c(^relations; while of the 26 utterances judged fluent, 19 
yielded negative correlations and 7 yielded positive* correlations. These find- 
ings are graphically illustrated in Figure 14. 

In Figure 14, the 23 stuttered utterances are shown on the top half of the 
graph; while the 26 fluent utterances are shown on. the lower half. All positive 
correlations are shown to the right of center, and negative correlations to the 
left. There i-e-'a>signif icant positive correlation between abduct of and adductor 
activity for the stuttered utterances (£ < .01, sign test); there , is a signifi- 
cant negative correlation between abductor and adductor activity for the fluent 
utterances < .05, sign test), 

CONCLUSIONS . 

\ ' ' 

The results of the present study generate and support the following state- 
ments: 

1. A laryngeal component of stuttering clearly exists. 

2. Abnormal laryngeal muscle activity accompanied stuttering in all 
four subjects examined i * . ^ 

3. Two aspects of abnormal laryngeal muscle activity in s^utterlrlg 
are (a) high. levels of muscle activity and (b) disruption of . 
abductor-adductor reciprocity. 

4. The cooccurrence of the ):hree phenomena — (a) high levels of^ LaCtyn- . ^yif 
geal muscle activity, (b)' dispuptecL abductor-adductor recij^rocilty, A 
and (c) perceived stuttering blocks^-would support the hypothesis 
that the three are intimately related. , 



DISCUSSION ./a" 



Gieneraliziation of Findings 



The EMG results derived from four subjects take on additional significance 
when vieyed in relation to the other physiological studies of laryngeal function- 
ing in stuttering (Chevrie-Mullet, 1963; Fujita, 1966; Ushijima et,al-, 1965; 
Conture, ferewer, and McCall, 1974). The picture emerging from these experiments 
(which were conducted independently, used a variety of instrumentations, and 
studied stutterers of three races who spoke three different languages) is con- 
sistent and supports the view that laryngeal involvement in stuttering is hot * 
an idiosyncratic phenomenon (Freeman, 1975). 



231 



ERIC, . . 



V 



In most cas^s, the present study has- verified hypotheses* of researchers who 
used indirect approaches for studying phonation in stuttering. Adams and Reis 
(1971, 1974), Adams and Hayden (1974), Adams, Riemenschneider , Metz, and Conture 
(1974),, and Agnello (1^74), all , predicated the initiation-of-phonation problem, 
demonstrated by subject CD. and correlated with disrupted abductor-adductor 
reciprocity. If these investigators are correct in their interpretation, then 
they are observing indirectly in. their subjects Ae same types of abnormal 
vmuscle activity studied directly in the present re3earch. ^ 

,/ ' ' * ^ 

Comments on Levels of Muscle/Activity 

4' 

The data relating tg/dif ferences in levels of miiscle activity may be inter- 
preted, in two ways, depending on the hypothesis espoused by the discussant, 
Both viewpoints are worthy of consideration. 

The first hypo^esis assumes that a moment of sttxttering is accompanied by 
higher levels of ifascle activity. \t also assumes that the higher average lev- 
els found for passages in which stuttering moments occur are the result of 
averaging the iagh peak levels for the blocks. with the normal base levels accom- 
panying the nonstuttered speech. Certainly the results of the present research 
support the/4irst contention of this , hypothesis, namely, the stuttered Utterance 
of a wor^As accompanied by levels of larjmgeal muscle activity higher than 
those ac^mpanying the fluent utterance of the same Word (Figures 4, 5, 10, 11, 
12, VM* However, '^if the raw EMG data (e^cemplif ied in Figures 1-3) is inspected 
closeiy, it becomes apparent that. phrases within the stuttered readings in which 
no identifiable blocks occur are accompanied by levels of muscle activity that 
are higher than those accompanying the utterance of the same phrase in the 
^nLuent reading. Within the frame of this hypothesis, the Ijigher levels accom- 
'^panying the words on which there is no identifiable blocking can be explained in 
ope of two ^ays: (1) by' expanding time constraints on the moment of stuttering 
to include events that precede or follow the identifiable block, or (2) by 
assuming that in addition to the identified blocks, the stutterer is also ex- ^ 
perie^cing ♦a number of moments of stuttering, or minimal blocks, that are not 
recognized by the listener. 

(the second hypothesis assumes 'that the stutterer in specific communicative 
environments habitually attempts to phonate while maintaining higher than^ normal 
levels of larjrngeal muscle activity. The high levels are viewed as being coun- 
terproductive in fluent utterance of sequential speech segments; and it is 
assumed that if the levels exceed some critical value, tjiey -will lead to a 
breakdown in fluency, that is~, a moment of perceived^ stuttering. ^ The' data • 
demonstrating lower levels of activity for the readings upder the fluency-evok- 
ing conditions can be interpreted as supporting this hypothesis. The finding of 
higher levels for phrases that. occur in the stuttered reading, but do not in- 
elude identifiable blocks, would a;Lso support this line of reasoning. 

♦ * 

Differentiation between the two hypotheses is .difficult because both would 
predict similar patterns of * correlation between levels of laryngeal muscle ^, 
activity and occurrence of moments of stuttering. Both wourd predict^ that the 
highest levels of activity would coincide with identifiable blocks; both would 
predict increases in levels of activity during the time periods preceding iden- 
tifiable blocks; and both would predict* lower levels of' activity during periods 
of fluency. The generally elevated baseline of activity during fluent utterance 
between blocks;* which would be predicted by the second hypothesis, might be 

- ' 233 

231 



testable if It were possible to define the temporal para||^ters of a given 
"moment of stuttering." However, If a "moment of stuttering" Is viewed as In- 
cluding events that precede or follow the Identifiable block by unspecified time 
periods, It becomes difficult or Impossible to define the beginning or the end 
of a given "moment of stuttering." Although Investigations of the temporal re- 
lationship .between Identifiable blocks and levels of laryngeal muscle activity 
are being conducted, no experimental method for testing the differential valid- 
ity of these two hypotheses has yet been deylsed. On the other hand, it i^also 
important to note that the two hypotheses ^e neither incompatible nor muti^ly 
exclusive. - 

Comments on Disrupted Reciprocity ' ,. 

• i 

As described by Sherrington (1909) , "reciprocal inhlBltion" facilities 
coordinated movement by agonist muscles through relaxation of antagonist mus^ 
cles. As demonstrated by Travill and Basmajian (1961), the antagonist in a 
muscle pair usually relaxes completely while the agonist i§ active. Stuc^es of . 
normal subjects, and indeed, recordings of the induced fluency readings of ,the 
stuttering subjects, show highly' consistent reciprocity -between the abductor 
(PCA) and the adductor group, particularly the INT. It is possible that whis- 
pered speech may* be produced by simultaneous contraction of the PCA and some 
adductor muscles;^ but for normal phonation the effects of abductor-adductor 
cocontraction are clearly counterproductive. From the data collected on D.M. 
and CD., strong cocontraction of the laryngeal antagonists appears incompatible 
with normal phonation. In many instances, cocontraction occurred during a 
silent period just j^rior to an utterance. When cocontractioji occurred during 
sound production, audible disruptions accompanied tHe event. For both subjects, 
the termination of cocontraction was almost invariably followed (50 to 150 msec) 
by a fluent sounding utterance. - ^4 

Mr 

Normal, fluent utterance of a CV syllable requires a specific change of 
laryngeal muscle tension pattern (this is true even if the consonant is voiced) , 
and a specific change in glottal state (glottal constriction ,1s different for 
consonants and vowels) within constrained time limits. Interpretation of the 
EMG evidence suggests that th^ effect of cocontraction was to prevent, delay,, 
or inhibit the normal transition from the consonant into the vowel. 

REFERENCES 

Adams, M. R. and R. Hayden. (1974) Stutterers' and nonstutterersV ability to 
initiate and terminate phonation during nonspeech activities, l^aper pre- 
sented at the Annual ConventdLon of the American Speech ^rid Hearing 
AssociatlQn, Las Vegas, Nev., 5-8 November. 

Adams, R. and J. Hutchinson. (1974) The effects of thrbe levels of auditory 
masking on selected vocal characteristics and the frequency of dysfluency 
of adult stutterers. J. Speech Hearing Res. 17 , 682-6^8. 

Adams, H. R. and R. Reis. (1971) The Influence of the onset of phonation on 
the frequency'ot stuttering. J. Speech Hearing Res. 14 , 639-644. 

Adams, M. R. and R. Reis. (1974) The Influence of the onset of phonation on 
* th^ frequency of stuttering: A ifepllcatlon and. re-evaluation. ' J. ^Speech 
Hearing Res/ 17, 75?-754. 

~i ^ ^ ■ / ^ ■ ' 

Thomas Shipp, 1975: personal coraiiiunlcation. 
234 ■ 



Adams, M. R. S. Riemenschneider, D. E. Metz, and E. G. Conture. (1974) Voice 
onset and articulatory constriction requirements Jn a speech segment, and 
their relation to the amount of stuttering adaptation. Paper presented. at 
the Annual Convention of the American Speech and Hearing Association, 
Las Vegas, Nev.^ 5-8 November.* ^ , 

Agnello, J. (1971) Transitional features i)f 'stutterers and nonstutterers. 

ASHA; Journal of the American Speech and Heating Association 1^(A) . 
Agnello, J. (1974) Laryngeal and articulatory dynamics of dysfluency inter- 
preted within a vocal tract model. In Voc^l Tract Dynamics and Dysfluency , 
ed. by L. M. Webster and L. C. Furst. (New York: Spe^ech and fi^aring » 
Institute).^ ' • ' 

,Arnott, G. Niel. (1828) ^ Elements of Physics as reported in Hunt (1861).* 
Basmajian, J. V. an4. G. A. Stecko. (1962) A new^-^bj.polar indwelling electrode 

for electromyography. J. Appl. Physiol. 17 , 849. 
Bigland, B. and 0. C. J. Lippold. (1954) The relation between force, velocity 
and integrated electrical activity in human muscles: J. Physiol. 123 » 
214-224. - ' . 

Brenner, N. C, W. H. Perkins, and G. A. Soderberg. (1972) The. effect of 

rehjasq^gaX on frequency of stutteritvg. J* Speech ^eaf^ing Res . 15 , 47!4-482, 



ERIC 



Chevrie-WuWLer,. C. (1963) A study of laryngeal* function in stutterers by the 
glot<:al-grapH%, method. In Proc.^ VII Congress de la Societe Frangaifee de 
Medicitie de la Voix et de la Parale , Parfs. / 
Conture, E. G. (1974) Some effects of noise on "the speaking behavior of » 

stutterers. J. Speech Hearing. Rgs. 17 , 714-723. 
Conture, E. G., D. W. Brewer, and G. N. McCall. (1974) Larjmgeal activity dur- 
ing the moment of stuttering: Some preliminary observations. ?aper pre- 
sented at the Annual Convention of the American Speech and Hearing Associa- 
tion^l Las Vegas, Nev.,5-8 November. ' ' 
Faaborg- Anders on, K. (1957) Electromyographic investigation of intrinsic. 

laryngeal muscjLes in humans. Acta Physiol. Scan4* > Suppl. 41 , 140.. 
Freeman, F. J. (1975)' The stuttering larjmx: An eJ;actromyographic study of 
laryngeal muscle activity acc0mpa,nying stuttering. Unpublished doctoral 
^dissertationi City University of New York. 
Freeman,,, F. J., M. F. Dorman, T. Ushijima, and S. Niimi. (1975) Lar3mgeal dys- 
^: function in stuttering: pMG and fiberoptiq studies. Unpublished manuy 
script. , . ^ , 

Fujita, K. (1966) Pathophysiology of the larynx from the viewpoint of phonation. 

J; Japan. Soc. Otorhinolaryngol. 6'9 , 459. 
Gay, T. and H. Hirose. (1973)^ Effect oi speaking rate on labial consonant pro- 
Juctionf A combined electromyographic/high-speed motion picture study. 
Phpnetica 27, 44-56. \ . . ' 

Gay, T., M. Strome, H. Hirose, and M. Sawashima. (19720 Electroinyo,graphy of^ 
the intrinsic larjmgeal muscle^ during phonation. Ann. Otol. Rhinol. 
Lferyngol. 81, 401-408. 
Hitano, M. and J. Ohala. (1969) Use of hooked-wire electrodes for electromyo- 
graphy of the intrinsic laryngeal muscles. J. Speech Hearing Res* 12 , 362-^ 
373. * , • * ^ 

Hirano, M. , J. Ohala, and W. Vennard. (1970) Regulation of [regis'ter, pitch and 

' intensity of voice. Folia Phoniat. 22,^1-20. 
Hirose, H. ,(1971)^ Electromyography of the articulatory mpsc'les: Current in- 
strumentation and technique. Hasfetttsrliafaoratories Status Report on Speech 
Research SR-25/26 , 73-86. . 
Hirose, H. (1974), Functional differentiation of the glottal adductors. Japan. 
Jt Otol.^ 77, 46-57. V . 

^ V ^ * ' 235 



o , . ^ 233 



/ 



(1972) The activi|:y of the intrinsic lar3mgeal muscles 
An electromyographic study. Phonetica 25 , 140-164. 

(1973) Laryngeal control in vocaj attack: An eJrSctro- 
Fol ia Phonl at. 25, 203-2i3. / 



Hirose/ H. and T. Gay. 

in voicing control: 
Hirose, and T. Gay. 

myograi>hic study. 

Hirose, H. and X. Ushijima. Xl974y The function of the posterior cricoaryten- 
oid in speech articulation. Raskins Laboratories Status Report on Speech 
Research SR-37/38 , 99-107. « 
Hunt, J. (1861) Stammering and Stuttering: Their Nature and Treatment , 1967 

ed.' (London: Rafner .Publishing Co.). 
^enyon, E. L. (1943) The etiology of stapnering: The p;sychophysiologic facts 
which concern the production of speech sounds and of stammering. J. Speech 
Hearing Pis. 8> 337-348. ; ' 

Kewley-Port, D. (1973) Computer pi^ocessing of EMG signals ^t Haskins Labora- 
Haskins Laboratories Status Report on Speech Research SR-33 , 173- 



D. (1974) An experimental evaluation of the EMG data processing 
Time cbnstant choice ^^or digital integration. Haskins Laborator- 



tories. 
184. 
Kewley-Port, 
system: 

ies Status Report on Speech Research SR-37/38 , 65-72. 
Kuehn, D. P. (1973) A cinefluorographic investigation of articulatory velqci- 

tiesf Dnptiblisihed doctoral thesis. University pf Iowa. 
Moravek, M. and J. Xangova. (1967)" Problems of the development of the initial 

tonus in stuttering. Fplla Phoniat. 19 . 109-116. 
Mxiller, J. P. (1833) Elements of Physiology , trans, by Baly (1857), reported 

in "Hunt. - . 

Port, D. K. (1971) The EMG data system. Haskins Laboratories Status Report 

on Speech Research SR-25/26 ,* 67-72. 
Schwartz, M. 0197^) TKe core of the 'stuttering block. J> Speech Hearing Pis. 
.39, 169-177. ... 

jSherrington," C. S. (1909) Reciprocal innervation of antagonistic muscles. 

Fourteenth note. On double reciprocal innervation. Proc. Royal Soc. B81 , 
249-268. . ^ 

Shipp, T. and R. McGlone. (1971) Laryngeal dynamics associated with' voice 
frequency change. J. Speech Hearing Res. 4^, 761-768. 

Stifomstra, C. (1965) A spectrographic^stuSy of dysfluencies labeled as stut- 
tering by parents. De Therapia Vocis et -Loquellde 1, 317-320. 

Travill, A. and J. V. Basmajian. (X961) .Electromyography of* the supinators of, 
the forearm, l^at. Rec. 139 , 557-56C 

Ushijima, "T. , G. Kamiyama, H. Hirose, an^S.'Niimi. (1965) Articulatory move- 
mer^s of the larynx .during stuttering (a film produced at the Research 
Instr^ute of Logopedics and Phoniatijfics, Faculty o{ Medicine, University of 
Tokyo) . 

Wingate, M. A. (1969) Sound pattern in| "artificial" fluency.' J. Speech Hear- 
ing Res4\ l2> 677-686. 

Wingkte', M. E'.T"(1970) Effect on stuttering of changes in *audition. J. Speech 
Hearing R^s. 13, 861-873. 

Wyke, B. 
432. 



(1971) Th^ neurology of stammering. J. Psychosomatic Res. 15, 423- 



: '236 



ERIC 



234 



II. PUBLICATIONS AND REPORTS 



Abramson, A. S. (1976) Static and dynamic acoustic cues in distinctive tones. 
Journal of the Acoustiical Society of America Suppl. 59 , S42(A). 

Blechner, M. J., R. S. Day, and J. E. Cutting. (1976) Processing two dimen- 
sions of nonspeech stimuli: The. auditory-phonetic distinction reconsid- 
ered. Journal of Experimental Psychology: Human Perception and Perfor- 
nuance 2_, 257-266. 

4 

Cutting, J. E. (1976) Auditory and linguistic processes in speech perception: 
Inferences from six fusions in dichotic listening. Psychological Review 
83, 114-140. 

Healy, A^i F. (1976) Detection errors on the word the: Evidence for reading 
units larger than letters. Journal of Experimental Psychology: Human 
. Perception and Performance 2, 235-242. 

Healy, A. F. and J. E* Cuttings. (1976) 'Units of speech perceptJ.on: Phoneme 
and syllable. Journal "of Verbal Learning and Verbal Behavior 15 , 73-83. 

Mermelstein, P. (1976) The syntax of acoustic segments. Conference Record, 

1976 IEEE International Conference on Acoustics, Speech and Signal ProCesfe- 
ing, pp. 33-36. * * . > 

Mermelstein^ P. and S. Levinson, "(1976) Speech recognition: Acoustic, pho- 
netic and formal-languag^ models. In Proceedings of the Fourth New England^ 
Bioenginee'ring Conference , ed. by S. Saha. (New York: Pergamon Press), 

* ^ • pp. 475-477. 



239 



. APPENDIX 



DDC (Defense Documentation Center) and ERIC (Educational Resoarces Information 
Center) numbers; « 



SR-21/22 to SR-44 



Status Report 






DDC 


ERIC 




SR-21/22 


January - June 1970 


AD 


719382 


ED-044-679 


SR-23 


July'- September 1970 


AD 


723586 


ED-052-654 


SR-24 


' October - December 1970 


AD 


727616 


ED-052-653 


^SR-25/26 


January - June 1971 


AD 730013 


:ED-056-560 


SR-27 


July - September 1971 


AD 


749339 


t 

■ED-071-533 


SR-28 • 


October- December 1971 


AD 


-742140 


W06I-837 


SR-29/30 


January - June 1972^ 


AD 


750001 


t 

ED-07 1-484 


SR~31/32 


July - December 1972 


• 

AD 


757954 


^ED-077-285 • 


SR~33 


January - March 1973 


AD 


762-373 


' ED-0&1-263 . 


sR-34' 


April - June. 1973 • 


AD 


766178 . 


'/ ED-681-295 


SR-35/36 . 


Ji^y - December 1973 


AD 




ED4094-4^4, ■ 


* SR-37/38 


January - June 1974 


AD 


783548 


ED-094-445 




SR-39/40 


July - December 1974 


Ad A007342 


ED-102-633 


— ? 


SR-41 


January- March 1975 


AD 


A103325 


ED-109-722 


SR-42/43 • 


April •- September L975 


AD 


A018369 


ED-117-770 


SR-44 


October - December 1975 


'AD A023059 


ED-119-273 



AD numbers may be ordered from: U.S. Department of Commerce 

National Technical Information Service 
' 5285 Port Royal Road 
Springfield, Virginiai 22151.,, 

ED numbers may be ordered from: ERIC Document: Reproduction Sfrvice . 

Computer Microf^ilm "International Corp. (CMl^C) , 
P.O. Box 190 ,1 

Arlington,, Virginl^a 22210 * 

HaskinF Laboratories Status Report on Speech Research is abstracted in Language 
and Behavior Abstracts , P.0» Box 22206, San Diego, California 92122. 

. ^ ./ .241 



ERIC 



236 ' 



UNCIASSJFIED 



DOCUMENT CONTROL DAT-A .R&D 



OMic»«4AT'NC ACTIVITY (CorpufMrc author) 

Raskins Laboratories » Inc. 



itt. «EPO« T 5E CUHiT Y C I, A SSIKIC A 1 lOU 

Unclassified 



270 Crown Street 

New Haven, 'Connecticut 06510 


26. Croup 


s 'J- 


J REPORT TITuE 

1. * 


• 


* 




A 

Raskins Laboratories Status Report on Speech ReseaCch, No. 45/46, 


January - June 1976 


4 OESCPiPTiv^ NOTES f Type of rrport «nd,*inc/{isivc rforr*; 

Interim Scientific Report * 








s a0.thCRiS> //-ifAjf nnmc, mtddtc inttml. last nomr) , 






• 


Staff of Raskins Laboratories; Alvin M. Liberman, P.I. 






May 1976 


7o. TOTAL NO OP PAGES 

241 


7^1. NO Or pcrs 

316 ^ 


HD-019^4 

V101(134,)P-342 > 

N000i4-76-C-0591 

DAAB03-75-C-04l9(L433)< 

NOl-HD-1-2420 
RR-5596 


9a. ORiCwgATOR'S REPORT NumB£R(S) 

- 

SR-45/46 (1976) ' • 


96, OTHt'R REPORT uO^S) (Aoy othvr numbers that may be assitincii 
this report) ^ * 

None 


to CISTRIOUT.OM STAT CMENT 
* 






f ^ 


Distribution* of thi^ docximent is unlimited.'^ 






1 1^ »'UT*l tMCNTAf<Y NOTCS 


\Z ;SP0N50RINC MILI T AMY ACTIVITY 


* 

N/A ' . • 


See No. 8 







13 A i« 'j T. J* A C 1 



,This report^ January - 30 June 1976) is one of a regular* series on the status and 
'progress of studies on the nature of speech, instrumentation of its investigation, and 
practical^ applications. Manuscripts, cover the follo\fring topics: 

Exploring Relations between Reading and Speech 
Interpreting Error Pattern in Beginning* Reading 

Comments on Session: Percept iotx and Production of Speech II; Conference on Origins and 

Evolution of Language an4 Speech 
Consonant Envlronmejit Specifies Vowel Identity » 
What Information Enables Listener to Map Talker's Vowel Space? . 
Identification Dichotic Fusions , 

Discrimination Dichotic Fusions * ^ . 

Coperception: Two Further- Preliminary StudiSsr 

"Posner's Paradigm" and Categorical ferception: Negative Study 

W^ak Syllables in Primitive Reading-Machine Algorithm * " . 

Control Fundamental Frequency, Intensity, Register of Phonation 

Effect of Delayed Auditory Feedback on Phonation: Electromyographic Study 

Some Aspects of Coarticulation 

Function of Strap Muscles in Speech 

Laryngeal Muscle Activity in Stuttering^' ' . » 



<;/N '0101 -so;' -6? 1 1 



ERIC 



*This document contains no information 

not freely available to the general public. 
It is distributed primarily for library use< 

2^ 



UNCLASSIFIED 



Si'cutily Clussihcuhon 



♦ ♦ UNCLASSIFIED 



Reading and .Speech - Relations 
iReading Errors Interpretation 
Speech Production: Language Evolution 
Vowel Identity: Consonant Environment 
Vowel Space - Mapping 
Dichotic Fuisons - Identification 
Dichotic Fusions - Discrimination 
Coperception ' 
Categorical Perception - Posner's Paradigm 
Syllables Weak - Algorithm * 
Phonation, Register, Fundamental Freq^uency, 

Intensity 
Phonation, Delayed feedback - Study 
^Coarticulation - Aspects 
Strap Muscles - Function , 
Stuttering - Muscle Activity 



«0 L E W T 



ERIC 



DD /r.,1473 <BACK) 



238 



UNCLASSIFIED 



Sccuiitv CljnsiCicutinn 



